🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Kafka Deep Dive

Data SystemsEvent Streaming🟢 Free Lesson

Advertisement

Data Systems

Kafka Deep Dive

Apache Kafka is the de facto standard for event streaming. Master its architecture, partitioning model, consumer groups, and exactly-once semantics for building event-driven systems.

  • Distributed — Horizontally scalable across many brokers
  • Durable — Persistent log with configurable retention
  • Real-Time — Sub-millisecond latency for event delivery

Kafka is not just a message queue—it's a distributed event log.

Kafka Architecture

DfApache Kafka

Apache Kafka is a distributed event streaming platform that stores events in an immutable, append-only log. Producers write events to topics, and consumers read from topics at their own pace. Kafka provides durability through replication, ordering within partitions, and horizontal scalability through partitioning.

ProducersApp AApp BKafka ClusterTopic: eventsPartition 0 (leader)Partition 1 (leader)Partition 2 (leader)Append-only logBrokersBroker 1 (controller)Broker 2Broker 3ConsumersService XService Y

Key Concepts

ConceptDescription
TopicNamed stream of events (like a table)
PartitionUnit of parallelism within a topic
OffsetUnique sequential ID for each event in a partition
BrokerKafka server that stores and serves data
ReplicaCopy of a partition for fault tolerance
Consumer GroupGroup of consumers that divide partition consumption

Partitioning Model

DfKafka Partitioning

Partitioning is the fundamental unit of parallelism in Kafka. Each topic has one or more partitions, and each partition is an ordered, immutable sequence of events. Events with the same key are guaranteed to be in the same partition, maintaining order per key.

Partition Assignment

P(event)=hash(event.key)modNpartitionsP(event) = hash(event.key) \mod N_{partitions}

Here,

  • P(event)P(event)=Partition assigned to the event
  • event.keyevent.key=The event's partition key
  • NpartitionsN_{partitions}=Total number of partitions

The number of partitions is set at topic creation and cannot be decreased. More partitions mean more parallelism but also more file handles, memory usage, and end-to-end latency. Start with a reasonable number (e.g., 6-12) and scale as needed.

Consumer Groups

DfConsumer Group

A consumer group is a set of consumers that collaboratively consume events from a topic. Each partition is assigned to exactly one consumer in the group, ensuring events are processed in order within a partition. Consumers within a group don't share partitions—each partition is consumed by one consumer.

ConfigurationEffect
N consumers = N partitionsEach consumer gets one partition
N consumers < N partitionsSome consumers get multiple partitions
N consumers > N partitionsSome consumers are idle
RebalancingAutomatic redistribution when consumers join/leave

Consumer Group Scaling

Topic "orders" has 6 partitions with 3 consumers in a group:

Initial state: Each consumer handles 2 partitions Consumer 4 joins: Rebalance → 2 consumers get 2, 2 consumers get 1 Consumer 1 fails: Rebalance → remaining 2 consumers handle 3 each

This automatic scaling enables horizontal scaling of event processing.

Exactly-Once Semantics

DfExactly-Once in Kafka

Exactly-once semantics in Kafka ensures each event is processed exactly once end-to-end—from producer to consumer to downstream system. This is achieved through idempotent producers, transactional APIs, and consumer offset management within transactions.

MechanismPurpose
Idempotent producerPrevents duplicate writes during retries
TransactionsAtomic writes across multiple partitions
Consumer offset in transactionCommit offset with processed data atomically
Transactional outboxAtomic database write + event publish

Exactly-once semantics requires cooperation between the producer, Kafka, and the consumer. The producer must be idempotent, Kafka must support transactions, and the consumer must commit offsets within a transaction.

Kafka Retention

Retention TypeDescription
Time-basedDelete events after N days (default: 7 days)
Size-basedKeep only the last N GB per partition
Log compactionKeep only the latest value per key
InfiniteNever delete (requires sufficient disk)

Practice Exercises

  1. Topic Design: Design the Kafka topics for an e-commerce order system. What topics would you create, how many partitions, and what retention policy?

  2. Consumer Design: Design a consumer group for processing payment events. How do you ensure exactly-once processing when writing to a PostgreSQL database?

  3. Partitioning Strategy: For a topic with user events, choose a partitioning strategy that ensures events for the same user are ordered but load is balanced. What happens when a user has significantly more events than others?

  4. Architecture Decision: Compare Kafka with RabbitMQ for a task queue system. What are the trade-offs in terms of ordering, throughput, and replay capability?

Key Takeaways:

  • Kafka stores events in an immutable, append-only log
  • Partitions provide parallelism; consumer groups divide partition consumption
  • Exactly-once semantics requires idempotent producers, transactions, and consumer coordination
  • Retention can be time-based, size-based, or log-compacted
  • The number of partitions determines maximum parallelism
  • Kafka is ideal for event sourcing, data pipelines, and real-time analytics

What to Learn Next

-> Stream Processing Real-time data processing with Flink, Spark Streaming, and Kafka Streams.

-> Redis Deep Dive Redis data structures, persistence, clustering, and use cases.

-> Event-Driven Architecture Event sourcing, CQRS, and message-driven systems.

-> Message Queues Async processing, event-driven architecture, and pub/sub patterns.

-> Data Lake Architecture Storage, processing, and governance for large-scale data.

-> Batch Processing MapReduce, Spark, and distributed batch processing.

Premium Content

Kafka Deep Dive

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement