πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Kafka System Design Interview

Apache KafkaSystem Design⭐ Premium

Advertisement

Kafka System Design Interview

Difficulty: Senior Level | Companies: LinkedIn, Uber, Netflix, Spotify, Confluent

Content

System design interviews test your ability to architect distributed systems using Kafka. Focus on requirements, trade-offs, and practical implementation.

Design Framework

Architecture Diagram
System Design Process:
1. Requirements Clarification
   β”œβ”€β”€ Functional requirements
   β”œβ”€β”€ Non-functional requirements
   └── Constraints

2. High-Level Design
   β”œβ”€β”€ Component diagram
   β”œβ”€β”€ Data flow
   └── API design

3. Deep Dive
   β”œβ”€β”€ Kafka architecture
   β”œβ”€β”€ Schema design
   └── Partitioning strategy

4. Trade-offs
   β”œβ”€β”€ Consistency vs availability
   β”œβ”€β”€ Latency vs throughput
   └── Cost vs performance

Design Problem: E-Commerce Order System

Requirements

Architecture Diagram
Functional Requirements:
β”œβ”€β”€ Place orders
β”œβ”€β”€ Process payments
β”œβ”€β”€ Update inventory
β”œβ”€β”€ Send notifications
└── Track order status

Non-Functional Requirements:
β”œβ”€β”€ High availability (99.99%)
β”œβ”€β”€ Low latency (< 100ms)
β”œβ”€β”€ Exactly-once processing
β”œβ”€β”€ Orderly processing per user
└── Support 100K orders/day

Constraints:
β”œβ”€β”€ Existing PostgreSQL database
β”œβ”€β”€ Must integrate with payment gateway
└── Budget for infrastructure

High-Level Design

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Order Service                                   β”‚
β”‚  └── Kafka Producer                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Kafka Cluster                                   β”‚
β”‚  β”œβ”€β”€ orders topic (64 partitions)               β”‚
β”‚  β”œβ”€β”€ payments topic (32 partitions)             β”‚
β”‚  β”œβ”€β”€ inventory topic (32 partitions)            β”‚
β”‚  └── notifications topic (16 partitions)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Consumer Services                               β”‚
β”‚  β”œβ”€β”€ Payment Service                             β”‚
β”‚  β”œβ”€β”€ Inventory Service                           β”‚
β”‚  β”œβ”€β”€ Notification Service                        β”‚
β”‚  └── Analytics Service                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Topic Design

{
  "topics": {
    "orders": {
      "partitions": 64,
      "replication_factor": 3,
      "retention_ms": 604800000,
      "cleanup_policy": "delete",
      "key_strategy": "user_id"
    },
    "payments": {
      "partitions": 32,
      "replication_factor": 3,
      "retention_ms": 2592000000,
      "cleanup_policy": "compact,delete",
      "key_strategy": "order_id"
    },
    "inventory": {
      "partitions": 32,
      "replication_factor": 3,
      "retention_ms": 86400000,
      "cleanup_policy": "delete",
      "key_strategy": "product_id"
    },
    "notifications": {
      "partitions": 16,
      "replication_factor": 3,
      "retention_ms": 86400000,
      "cleanup_policy": "delete",
      "key_strategy": "user_id"
    }
  }
}

Java Implementation

Order Producer

@Service
public class OrderService {
    
    private final KafkaTemplate<String, Order> kafkaTemplate;
    private final SchemaRegistryClient schemaRegistry;
    
    @Autowired
    public OrderService(KafkaTemplate<String, Order> kafkaTemplate) {
        this.kafkaTemplate = kafkaTemplate;
    }
    
    @Transactional
    public Order placeOrder(OrderRequest request) {
        // Validate order
        validateOrder(request);
        
        // Create order
        Order order = createOrder(request);
        
        // Send to Kafka (atomic with DB transaction)
        kafkaTemplate.send("orders", order.getUserId(), order);
        
        return order;
    }
    
    private void validateOrder(OrderRequest request) {
        if (request.getItems().isEmpty()) {
            throw new InvalidOrderException("Order must have items");
        }
        if (request.getTotalAmount().compareTo(BigDecimal.ZERO) <= 0) {
            throw new InvalidOrderException("Invalid amount");
        }
    }
    
    private Order createOrder(OrderRequest request) {
        Order order = new Order();
        order.setOrderId(UUID.randomUUID().toString());
        order.setUserId(request.getUserId());
        order.setItems(request.getItems());
        order.setTotalAmount(request.getTotalAmount());
        order.setStatus(OrderStatus.PENDING);
        order.setCreatedAt(Instant.now());
        return order;
    }
}

Payment Consumer

@Service
public class PaymentConsumer {
    
    private final PaymentService paymentService;
    private final KafkaTemplate<String, Payment> kafkaTemplate;
    
    @KafkaListener(
        topics = "orders",
        groupId = "payment-service",
        containerFactory = "kafkaListenerContainerFactory"
    )
    @Transactional
    public void processOrder(Order order) {
        try {
            // Process payment
            Payment payment = paymentService.charge(order);
            
            // Send payment event
            kafkaTemplate.send("payments", order.getOrderId(), payment);
            
            // Update order status
            order.setStatus(OrderStatus.PAYMENT_PROCESSED);
            
        } catch (PaymentException e) {
            // Handle payment failure
            order.setStatus(OrderStatus.PAYMENT_FAILED);
            kafkaTemplate.send("orders", order.getUserId(), order);
        }
    }
}

Python Implementation

Order Producer

from fastapi import FastAPI, HTTPException
from kafka import KafkaProducer
from pydantic import BaseModel
from typing import List
import json
import uuid

app = FastAPI()

producer = KafkaProducer(
    bootstrap_servers=['kafka1:9092', 'kafka2:9092'],
    key_serializer=lambda k: k.encode('utf-8'),
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    acks='all',
    enable_idempotence=True,
    transactional_id='order-service-tx'
)

producer.init_transactions()

class OrderItem(BaseModel):
    product_id: str
    quantity: int
    price: float

class OrderRequest(BaseModel):
    user_id: str
    items: List[OrderItem]

@app.post("/orders")
async def create_order(request: OrderRequest):
    try:
        # Validate
        if not request.items:
            raise HTTPException(status_code=400, detail="Order must have items")
        
        # Create order
        order = {
            "order_id": str(uuid.uuid4()),
            "user_id": request.user_id,
            "items": [item.dict() for item in request.items],
            "total_amount": sum(item.price * item.quantity for item in request.items),
            "status": "PENDING"
        }
        
        # Send to Kafka
        producer.begin_transaction()
        producer.send("orders", key=order["user_id"], value=order)
        producer.commit_transaction()
        
        return {"order_id": order["order_id"], "status": "created"}
        
    except Exception as e:
        producer.abort_transaction()
        raise HTTPException(status_code=500, detail=str(e))

Payment Consumer

from kafka import KafkaConsumer, KafkaProducer
import json

consumer = KafkaConsumer(
    'orders',
    bootstrap_servers=['kafka1:9092'],
    group_id='payment-service',
    enable_auto_commit=False,
    auto_offset_reset='earliest'
)

producer = KafkaProducer(
    bootstrap_servers=['kafka1:9092'],
    key_serializer=lambda k: k.encode('utf-8'),
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    acks='all',
    enable_idempotence=True,
    transactional_id='payment-service-tx'
)

producer.init_transactions()

for message in consumer:
    order = json.loads(message.value)
    
    try:
        # Process payment
        payment = process_payment(order)
        
        # Send payment event
        producer.begin_transaction()
        producer.send("payments", key=order["order_id"], value=payment)
        
        # Update order status
        order["status"] = "PAYMENT_PROCESSED"
        producer.send("orders", key=order["user_id"], value=order)
        
        producer.commit_transaction()
        consumer.commit()
        
    except Exception as e:
        producer.abort_transaction()
        print(f"Payment failed: {e}")

Schema Design

Avro Schemas

// Order schema
{
  "type": "record",
  "name": "Order",
  "namespace": "com.example.events",
  "fields": [
    {"name": "order_id", "type": "string"},
    {"name": "user_id", "type": "string"},
    {"name": "items", "type": {"type": "array", "items": "OrderItem"}},
    {"name": "total_amount", "type": "double"},
    {"name": "status", "type": "OrderStatus"},
    {"name": "created_at", "type": "long", "logicalType": "timestamp-millis"}
  ]
}

// Payment schema
{
  "type": "record",
  "name": "Payment",
  "namespace": "com.example.events",
  "fields": [
    {"name": "payment_id", "type": "string"},
    {"name": "order_id", "type": "string"},
    {"name": "amount", "type": "double"},
    {"name": "status", "type": "PaymentStatus"},
    {"name": "processed_at", "type": "long", "logicalType": "timestamp-millis"}
  ]
}

Monitoring & Alerting

# Prometheus alerts
groups:
  - name: kafka-alerts
    rules:
      - alert: HighConsumerLag
        expr: kafka_consumergroup_current_offset - kafka_consumergroup_end_offset > 10000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High consumer lag detected"
          
      - alert: TransactionFailed
        expr: rate(kafka_producer_transaction_failed_total[5m]) > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Transaction failures detected"
          
      - alert: OrderProcessingDelayed
        expr: histogram_quantile(0.99, rate(order_processing_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Order processing latency high"

Performance Considerations

Architecture Diagram
Partitioning Strategy:
β”œβ”€β”€ Use user_id as key for ordering guarantees
β”œβ”€β”€ 64 partitions for orders (100K/day = ~70 orders/min)
β”œβ”€β”€ 32 partitions for payments (lower volume)
└── 16 partitions for notifications (lowest volume)

Replication:
β”œβ”€β”€ Replication factor: 3
β”œβ”€β”€ min.insync.replicas: 2
└── acks: all

Retention:
β”œβ”€β”€ Orders: 7 days (for debugging)
β”œβ”€β”€ Payments: 30 days (for audit)
β”œβ”€β”€ Notifications: 1 day (ephemeral)
└── Use compacted topics for state

ℹ️

Key Insight: In system design interviews, focus on the trade-offs you make and justify your decisions. There's no single "correct" answer - demonstrate your understanding of Kafka's capabilities and limitations.

Follow-Up Questions

  1. How would you handle a sudden spike in order volume?
  2. What happens if the payment service is down?
  3. How would you implement order cancellation?
  4. Explain how you would handle duplicate orders.
  5. How would you scale this system to support 1M orders/day?

Advertisement