drio site

Kafka is a powerful piece of software that enables very useful architectural patterns. In this post I want to capture my understanding so I can reference it in the future. I will keep updating the post as I acquire more knowledge or discover refinements and errors.

Kafka is a distributed log messaging system. It has a set of components that work in parallel to perform the task of writing and reading messages from partitions. That task has to be consistent, meaning that all clients interacting with Kafka—producers, brokers, and consumers—must have a coherent view of the system’s state.

Some vocabulary first

HA (High Availability) vs FT (Fault Tolerance)

HA: users can always access the system, even if parts of it fail.
FT: if components crash, data remains correct and intact—no corruption
or loss.

In a Kafka system:

We achieve HA by running brokers on multiple machines and using
load balancers to route traffic. If one broker fails, others can still
serve requests.
We achieve FT by replicating data across brokers and ensuring that
only in-sync replicas are eligible for leader election.

Replication factor gives you fault tolerance.
Number of partitions enables scalability and parallelism, not HA directly.

The functionality

To me, Kafka provides the following functionality. First, it helps decouple services—consumers and producers don’t talk directly, they use Kafka as a shared interface. Kafka is very performant: it can ingest large volumes of messages with low latency.

Kafka provides durable storage. Messages are written to disk and removed only based on retention policies. A byproduct of this is replayability: you can replay messages to consumers as many times as needed.

Kafka is scalable—you can increase throughput by adding partitions and consumers. It’s also fault tolerant: messages are persisted and replicated across machines. Kafka has an “ACK” mechanism that, when configured properly, offers strong delivery guarantees.

The main components

Producers: Applications that publish data to Kafka topics.
Topics: Logical categories or feeds to which records are published.
Brokers: Kafka servers that store data and serve client requests.
Partitions: Divisions of topics that allow parallelism and scalability.
Consumers: Applications that subscribe to topics and process data.
ZooKeeper (traditional) or KRaft (newer): Manage metadata and coordinate the Kafka cluster.

What is metadata?

To answer that, we need to talk about ZooKeeper or KRaft.

Metadata refers to the structure and state of the Kafka cluster: topics, partitions, configurations, broker registrations, and leader assignments.

Topic vs Partition vs Consumer Group

A topic is a logical channel where you publish messages. A partition is a subdivision of a topic—that’s where messages are stored. Partitions are immutable. Kafka spreads partitions across brokers to support fault tolerance and scalability.

A consumer group is a set of consumers working together to process a topic:

Each partition is consumed by one consumer in the group.
If you have more consumers than partitions, some consumers stay idle.
If you have more partitions than consumers, some consumers handle
multiple partitions.

The purpose of ZooKeeper / KRaft

Machines can fail, networks can partition, and messages can arrive late or out of order. So we need a mechanism to ensure that even if something goes wrong, the cluster behaves in a predictable, coordinated way.

That’s where consensus comes in.

Kafka uses a leader-follower model for partitions. Each partition has:

One leader broker that handles all reads and writes.
One or more follower brokers that replicate the leader’s data.

Producers write to the partition leader. The leader appends to a write-ahead log, replicates to the in-sync replicas (ISRs), and then acknowledges the write.

Kafka guarantees consistency at the partition level: messages in a partition are strictly ordered and replicated in that order. Consumers read from the partition leader and track their position using offsets, which they can commit for fault-tolerant processing.

Kafka uses consensus (ZooKeeper or KRaft) to ensure that all brokers agree on who’s in charge of the cluster and who’s leading each partition. This coordination is what lets Kafka recover safely and continue working correctly when machines fail.

ZooKeeper is a separate system. Kafka brokers connect to any ZooKeeper node to get metadata. The ZooKeeper nodes maintain a consistent state by coordinating with each other via a consensus algorithm.

It is important how the znodes and the knodes (brokers) talk between each other:

All Kafka brokers connect to the ZooKeeper ensemble on startup.
Each broker watches key znodes (like /brokers and /controller) to stay informed about cluster state.
One broker is elected as the controller using ZooKeeper.
The controller broker coordinates cluster-wide actions—like assigning partition leaders and tracking broker membership—and writes those updates to ZooKeeper.
ZooKeeper doesn’t push updates. Instead, brokers use watchers that trigger when znodes change, prompting them to fetch the latest metadata.
If the controller broker fails, its ephemeral znode (/controller) disappears.
ZooKeeper detects the failure, and a new controller is elected.
The new controller takes over responsibility and updates ZooKeeper as needed.

Consumer groups

Kafka uses a group coordinator to assign partitions to consumers in a group. Each partition is assigned to only one consumer at a time.

When a consumer leaves or crashes, Kafka triggers a rebalance. The coordinator reassigns partitions among the remaining consumers.

Rebalance: A consumer leaves the group, and Kafka reassigns its partitions.

The `acks` setting

acks=all: producer waits for leader and all in-sync replicas.
Strongest durability—data is safe unless all replicas go down.
acks=1: wait for leader only.
acks=0: don’t wait at all.

You set the acks option in the producer client library.

Use acks=all when data loss is unacceptable. For low latency and looser durability, use acks=1 or acks=0.

Can data still be lost?

Yes, in some edge cases:

Unclean leader election

If the leader broker crashes and unclean.leader.election.enable=true, Kafka may elect an out-of-sync replica as leader. This can lead to data loss if the leader hadn’t replicated all recent writes.

If the setting is false (the default), Kafka waits until a proper in-sync replica is available.

Message in memory not flushed to disk

If Kafka crashes before flushing data to disk, the message is lost.

Low min.insync.replicas

If set too low, a write may be considered successful even if not all replicas acknowledged it. You want this to be at least 2 for safety.

Disk corruption

Kafka may think data was written to disk, but it wasn’t. Hardware issues or file system problems can cause this.

How to reduce risk of data loss

Use acks=all and min.insync.replicas >= 2.
Keep unclean.leader.election.enable=false.
Monitor ISR lag and disk health.
- ISR lag: how many messages a follower is behind the leader.
- Disk health:
  - Watch for I/O errors.
  - Track read/write latency and utilization.
  - Monitor Kafka broker logs.
  - Run read/write tests regularly.
  - Other checks may be useful.
Use a replication factor of 3 or more.