Kafka is a powerful piece of software that enables very useful architectural patterns. In this post I want to capture my understanding so I can reference it in the future. I will keep updating the post as I acquire more knowledge or discover refinements and errors.
Kafka is a distributed log messaging system. It has a set of components that work in parallel to perform the task of writing and reading messages from partitions. That task has to be consistent, meaning that all clients interacting with Kafka—producers, brokers, and consumers—must have a coherent view of the system’s state.
In a Kafka system:
Replication factor gives you fault tolerance.
Number of partitions enables scalability and parallelism, not HA directly.
To me, Kafka provides the following functionality. First, it helps decouple services—consumers and producers don’t talk directly, they use Kafka as a shared interface. Kafka is very performant: it can ingest large volumes of messages with low latency.
Kafka provides durable storage. Messages are written to disk and removed only based on retention policies. A byproduct of this is replayability: you can replay messages to consumers as many times as needed.
Kafka is scalable—you can increase throughput by adding partitions and consumers. It’s also fault tolerant: messages are persisted and replicated across machines. Kafka has an “ACK” mechanism that, when configured properly, offers strong delivery guarantees.
To answer that, we need to talk about ZooKeeper or KRaft.
Metadata refers to the structure and state of the Kafka cluster: topics, partitions, configurations, broker registrations, and leader assignments.
A topic is a logical channel where you publish messages. A partition is a subdivision of a topic—that’s where messages are stored. Partitions are immutable. Kafka spreads partitions across brokers to support fault tolerance and scalability.
A consumer group is a set of consumers working together to process a topic:
Machines can fail, networks can partition, and messages can arrive late or out of order. So we need a mechanism to ensure that even if something goes wrong, the cluster behaves in a predictable, coordinated way.
That’s where consensus comes in.
Kafka uses a leader-follower model for partitions. Each partition has:
Producers write to the partition leader. The leader appends to a write-ahead log, replicates to the in-sync replicas (ISRs), and then acknowledges the write.
Kafka guarantees consistency at the partition level: messages in a partition are strictly ordered and replicated in that order. Consumers read from the partition leader and track their position using offsets, which they can commit for fault-tolerant processing.
Kafka uses consensus (ZooKeeper or KRaft) to ensure that all brokers agree on who’s in charge of the cluster and who’s leading each partition. This coordination is what lets Kafka recover safely and continue working correctly when machines fail.
ZooKeeper is a separate system. Kafka brokers connect to any ZooKeeper node to get metadata. The ZooKeeper nodes maintain a consistent state by coordinating with each other via a consensus algorithm.
It is important how the znodes and the knodes (brokers) talk between each other:
All Kafka brokers connect to the ZooKeeper ensemble on startup.
Each broker watches key znodes (like /brokers and /controller) to stay informed about cluster state.
One broker is elected as the controller using ZooKeeper.
The controller broker coordinates cluster-wide actions—like assigning partition leaders and tracking broker membership—and writes those updates to ZooKeeper.
ZooKeeper doesn’t push updates. Instead, brokers use watchers that trigger when znodes change, prompting them to fetch the latest metadata.
If the controller broker fails, its ephemeral znode (/controller) disappears.
ZooKeeper detects the failure, and a new controller is elected.
The new controller takes over responsibility and updates ZooKeeper as needed.
Kafka uses a group coordinator to assign partitions to consumers in a group. Each partition is assigned to only one consumer at a time.
When a consumer leaves or crashes, Kafka triggers a rebalance. The coordinator reassigns partitions among the remaining consumers.
Rebalance: A consumer leaves the group, and Kafka reassigns its partitions.
acks
settingacks=all
: producer waits for leader and all in-sync replicas.acks=1
: wait for leader only.acks=0
: don’t wait at all.You set the acks
option in the producer client library.
Use acks=all
when data loss is unacceptable. For low latency and looser
durability, use acks=1
or acks=0
.
Yes, in some edge cases:
If the leader broker crashes and unclean.leader.election.enable=true
,
Kafka may elect an out-of-sync replica as leader. This can lead to data
loss if the leader hadn’t replicated all recent writes.
If the setting is false (the default), Kafka waits until a proper in-sync replica is available.
If Kafka crashes before flushing data to disk, the message is lost.
min.insync.replicas
If set too low, a write may be considered successful even if not all replicas acknowledged it. You want this to be at least 2 for safety.
Kafka may think data was written to disk, but it wasn’t. Hardware issues or file system problems can cause this.
Use acks=all
and min.insync.replicas
>= 2.
Keep unclean.leader.election.enable=false
.
Monitor ISR lag and disk health.
Use a replication factor of 3 or more.