Azure Event Hubs: Partitions, Consumer Groups & Capture
Real-time event streaming with partitioning, capture, and exactly-once processing guarantees
Event Hubs Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AZURE EVENT HUBS ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β PRODUCERS β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β IoT Hub β β Apps β β Logs β β Custom β β
β β Devices β β Services β β Streams β β Producersβ β
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β
β β β β β β
β ββββββββββββββββΌβββββββββββββββΌβββββββββββββββ β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β EVENT HUBS NAMESPACE β β
β β β β
β β EVENT HUB: sales-events β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Partition 0 β Partition 1 β Partition 2 β ... β P N β β β
β β β βββββββββββ β βββββββββββ β βββββββββββ β β β β
β β β β Offset β β β Offset β β β Offset β β β β β
β β β β Events β β β Events β β β Events β β β β β
β β β βββββββββββ β βββββββββββ β βββββββββββ β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β THROUGHPUT UNITS / CAPACITY UNITS β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β TU: 1 TU = 1 MB/s in, 2 MB/s out, 1000 events/s β β β
β β β CU: 1 CU = 1 MB/s in, 2 MB/s out (Standard/Prem) β β β
β β β Max: 40 TU (Basic/Standard), 100 CU (Premium) β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β CONSUMERS β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β
β β β Stream β β Databricksβ β Function β β Event β β β
β β βAnalytics β β (Spark) β β App β β Grid β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β
β β β β
β β CONSUMER GROUPS: β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β $Default (all consumers share offset) β β β
β β β stream-analytics-cg (dedicated) β β β
β β β databricks-cg (dedicated) β β β
β β β function-cg (dedicated) β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β CAPTURE (Auto-archiving to ADLS/Blob): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Event Hubs ββ> Capture ββ> ADLS Gen2 (Avro/Parquet) β β
β β Time window: 5 min or 1 MB β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Partitioning Strategy
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PARTITIONING STRATEGY β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β KEY-BASED PARTITIONING β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Event: {"deviceId": "sensor-123", "temp": 72.5} β β
β β β β
β β Partition Key: deviceId β β
β β β β
β β Hash(deviceId) % NumPartitions = Partition Assignment β β
β β β β
β β Partition 0: sensor-001, sensor-007, sensor-013... β β
β β Partition 1: sensor-002, sensor-008, sensor-014... β β
β β Partition 2: sensor-003, sensor-009, sensor-015... β β
β β β β
β β Guarantees: Ordering within partition β β
β β Limitation: Hot partitions if key distribution skewed β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ROUND ROBIN PARTITIONING β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β No partition key specified β β
β β Events distributed evenly across partitions β β
β β β β
β β Event 1 β Partition 0 β β
β β Event 2 β Partition 1 β β
β β Event 3 β Partition 2 β β
β β Event 4 β Partition 0 β β
β β β β
β β Guarantees: Even distribution β β
β β Limitation: No ordering guarantee β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β PARTITION COUNT CONSIDERATIONS: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Min: 2 partitions (Basic/Standard) β β
β β β’ Max: 32 partitions (Basic/Standard) β β
β β β’ Max: 128 partitions (Premium) β β
β β β’ Throughput: 1 TU/CU per partition β β
β β β’ Recommendation: 2-4x expected consumer count β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Event Hubs Capture Configuration
{
"properties": {
"captureDescription": {
"enabled": true,
"encoding": "Avro",
"destination": {
"name": "EventHubArchiveImageFormat",
"properties": {
"storageAccountResourceId": "/subscriptions/xxx/resourceGroups/rg/providers/Microsoft.Storage/storageAccounts/stdatalake001",
"blobContainer": "event-hubs-capture",
"archiveNameFormat": "{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}",
"timeWindow": "00:05:00",
"sizeLimitInBytes": 104857600,
"emptyWriterBehavior": "DropIfEmpty"
}
},
"skipEmptyArchive": true
}
}
}
Python Producer/Consumer
# Producer
from azure.eventhub import EventHubProducerClient, EventData
import json
producer = EventHubProducerClient.from_connection_string(
conn_str="Endpoint=sb://ns-prod.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;...",
eventhub_name="sales-events"
)
events = [
{"deviceId": "sensor-001", "temperature": 72.5, "timestamp": "2024-01-15T10:30:00Z"},
{"deviceId": "sensor-002", "temperature": 68.3, "timestamp": "2024-01-15T10:30:01Z"}
]
event_batch = producer.create_batch()
for event in events:
event_batch.add(EventData(json.dumps(event)))
producer.send_batch(event_batch)
producer.close()
# Consumer
from azure.eventhub import EventHubConsumerClient
def on_event(partition_context, event):
print(f"Received event from partition {partition_context.partition_id}")
print(f"Event body: {event.body_as_str()}")
print(f"Sequence number: {event.sequence_number}")
partition_context.update_checkpoint(event)
client = EventHubConsumerClient.from_connection_string(
conn_str="Endpoint=sb://ns-prod.servicebus.windows.net/...",
consumer_group="$Default",
eventhub_name="sales-events"
)
with client:
client.receive(
on_event=on_event,
starting_position="-1" # Start from beginning
)
βΉοΈ
Pro Tip: Use partition keys that evenly distribute events across partitions. Avoid using timestamps or sequential IDs as partition keys, as they create hot partitions.
Interview Questions
Q1: Explain the difference between Event Hubs and Event Grid. A: Event Hubs is a high-throughput event streaming service (millions of events/sec). Event Grid is a reactive event routing service (smart routing, filtering). Use Event Hubs for data ingestion pipelines; Event Grid for event-driven architectures.
Q2: How do you handle message ordering in Event Hubs? A: Event Hubs guarantees ordering within a partition. Use the same partition key for related events. For global ordering, use a single partition (limits throughput). For most use cases, partition-level ordering is sufficient.
Q3: What is the cost impact of Event Hubs Capture? A: Capture is included in the Event Hub cost (no additional charges). However, storage costs apply for the captured files in ADLS. Capture reduces the need for custom ETL jobs, saving compute costs.