Kafka on Kubernetes
Overview
Running Kafka on Kubernetes provides automated operations, scalability, and resource efficiency. This guide covers deploying Kafka with Strimzi operator, managing persistent storage, and implementing autoscaling.
Benefits of Kubernetes
- Automated Deployment: Declarative configuration
- Self-Healing: Automatic pod restarts
- Scaling: Horizontal and vertical scaling
- Resource Efficiency: Better cluster utilization
- Rolling Updates: Zero-downtime upgrades
Strimzi Operator Setup
Install Strimzi
# Install Strimzi using Helm
helm repo add strimzi https://strimzi.io/charts/
helm repo update
helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
--namespace kafka \
--create-namespace \
--set watchNamespaces=all
Kafka Cluster CRD
# kafka-cluster.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-kafka-cluster
namespace: kafka
spec:
kafka:
version: 3.5.1
replicas: 3
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
inter.broker.protocol.version: "3.5"
storage:
type: persistent-claim
size: 100Gi
class: fast-ssd
deleteClaim: false
resources:
requests:
memory: 4Gi
cpu: 2
limits:
memory: 8Gi
cpu: 4
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 20Gi
class: fast-ssd
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
Apply Configuration
# Apply Kafka cluster
kubectl apply -f kafka-cluster.yaml
# Check cluster status
kubectl get kafka -n kafka
# Check pods
kubectl get pods -n kafka -l app.kubernetes.io/name=kafka
Persistent Volumes
StorageClass Configuration
# storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
iopsPerGB: "10"
encrypted: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
PersistentVolumeClaim
# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: kafka-data-my-kafka-cluster-kafka-0
namespace: kafka
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
StatefulSet Configuration
# StatefulSet is managed by Strimzi, but here's the concept
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: my-kafka-cluster-kafka
namespace: kafka
spec:
serviceName: my-kafka-cluster-kafka
replicas: 3
selector:
matchLabels:
app.kubernetes.io/name: kafka
template:
metadata:
labels:
app.kubernetes.io/name: kafka
spec:
containers:
- name: kafka
image: quay.io/strimzi/kafka:3.5.1
ports:
- containerPort: 9092
name: plain
- containerPort: 9093
name: tls
env:
- name: KAFKA_CFG_NODE_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: data
mountPath: /var/lib/kafka/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
Topic Management
Topic CRD
# orders-topic.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: orders
namespace: kafka
labels:
strimzi.io/cluster: my-kafka-cluster
spec:
partitions: 6
replicas: 3
config:
retention.ms: 604800000 # 7 days
cleanup.policy: delete
compression.type: lz4
min.insync.replicas: 2
Topic Operations
# Create topic
kubectl apply -f orders-topic.yaml
# List topics
kubectl get kafkatopics -n kafka
# Update topic
kubectl patch kafkatopic orders -n kafka --type merge -p '{"spec":{"partitions":12}}'
# Delete topic
kubectl delete kafkatopic orders -n kafka
User Management
User CRD
# app-user.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
name: app-user
namespace: kafka
labels:
strimzi.io/cluster: my-kafka-cluster
spec:
authentication:
type: tls
authorization:
type: simple
acls:
- resource:
type: topic
name: orders
operations:
- Read
- Describe
host: "*"
- resource:
type: topic
name: orders
operations:
- Write
host: "*"
- resource:
type: group
name: order-processor
operations:
- Read
host: "*"
Horizontal Pod Autoscaling
HPA Configuration
# kafka-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: kafka-broker-hpa
namespace: kafka
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: my-kafka-cluster-kafka
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: kafka_server_BrokerTopicMetrics_MessagesInPerSec
target:
type: AverageValue
averageValue: "100000"
Custom Metrics
# custom-metrics.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: kafka-consumer-hpa
namespace: kafka
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-consumer
minReplicas: 2
maxReplicas: 20
metrics:
- type: External
external:
metric:
name: kafka_consumer_group_lag
selector:
matchLabels:
group: order-processor
target:
type: AverageValue
averageValue: "1000"
Rolling Updates
Zero-Downtime Updates
# Kafka cluster with rolling update strategy
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-kafka-cluster
spec:
kafka:
replicas: 3
rack:
topologyKey: topology.kubernetes.io/zone
config:
default.replication.factor: 3
min.insync.replicas: 2
storage:
type: persistent-claim
size: 100Gi
cruiseControl: {}
Update Strategies
# Trigger rolling update
kubectl annotate kafka my-kafka-cluster -n kafka \
strimzi.io/manual-roll-update=true
# Monitor rolling update
kubectl get pods -n kafka -l app.kubernetes.io/name=kafka -w
# Check rollout status
kubectl rollout status statefulset/my-kafka-cluster-kafka -n kafka
Monitoring with Prometheus
ServiceMonitor Configuration
# servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kafka-metrics
namespace: kafka
labels:
release: prometheus
spec:
selector:
matchLabels:
strimzi.io/cluster: my-kafka-cluster
namespaceSelector:
matchNames:
- kafka
endpoints:
- port: metrics
interval: 15s
path: /metrics
PrometheusRule
# prometheusrule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kafka-alerts
namespace: kafka
spec:
groups:
- name: kafka
rules:
- alert: KafkaUnderReplicatedPartitions
expr: kafka_server_replicamanager_underreplicatedpartitions > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Under-replicated partitions detected"
description: "{{ $value }} partitions are under-replicated"
- alert: KafkaConsumerLagHigh
expr: kafka_consumer_group_lag > 10000
for: 10m
labels:
severity: warning
annotations:
summary: "Consumer lag is high"
description: "Consumer group {{ $labels.group }} has lag of {{ $value }}"
Grafana Dashboard
{
"dashboard": {
"title": "Kafka on Kubernetes",
"panels": [
{
"title": "Kafka Brokers",
"type": "stat",
"targets": [
{
"expr": "count(kafka_server_BrokerTopicMetrics_MessagesInPerSec)",
"legendFormat": "Brokers"
}
]
},
{
"title": "Messages In Rate",
"type": "timeseries",
"targets": [
{
"expr": "sum(rate(kafka_server_BrokerTopicMetrics_MessagesInPerSec[5m]))",
"legendFormat": "Messages/sec"
}
]
}
]
}
}
Best Practices
Resource Management
# Resource recommendations
resources:
kafka:
requests:
memory: 4Gi
cpu: 2
limits:
memory: 8Gi
cpu: 4
zookeeper:
requests:
memory: 2Gi
cpu: 1
limits:
memory: 4Gi
cpu: 2
Backup Strategy
#!/bin/bash
# backup_kafka.sh
# Backup Kafka data
kubectl exec -n kafka my-kafka-cluster-kafka-0 -- \
kafka-metadata.sh snapshot /var/lib/kafka/data
# Backup PersistentVolume
kubectl get pv -n kafka -l app.kubernetes.io/name=kafka
# Export topic configurations
kubectl get kafkatopics -n kafka -o yaml > kafka-topics-backup.yaml
Disaster Recovery
# MirrorMaker 2 for cross-cluster replication
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
name: my-mirror-maker-2
namespace: kafka
spec:
version: 3.5.1
replicas: 2
connectCluster: "target-cluster"
clusters:
- alias: "source-cluster"
bootstrapServers: kafka-source:9092
- alias: "target-cluster"
bootstrapServers: kafka-target:9092
mirrors:
- sourceCluster: "source-cluster"
targetCluster: "target-cluster"
topicsPattern: "orders|payments|users"
topicsPatternExclude: ".*internal"
Summary
Running Kafka on Kubernetes with Strimzi provides automated operations, scalable deployment, and reliable storage. Implement HPA, monitoring, and backup strategies for production-ready deployments.