Event Hubs Deep Dive: Capture, Partitioning & Ordering
Master Event Hubs with advanced partitioning, Capture, and throughput optimization
Event Hubs Internals
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EVENT HUBS INTERNALS β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β NAMESPACE LEVEL β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Throughput Units (TU): Shared across all Event Hubs β β
β β β’ Premium: Capacity Units (CU) per Event Hub β β
β β β’ Max 40 TU (Standard), 100 CU (Premium) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β PARTITION LEVEL β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Each partition: ordered append-only log β β
β β β’ Offset: Position within partition β β
β β β’ Sequence Number: Monotonically increasing per partition β β
β β β’ Max 32 partitions (Standard), 128 (Premium) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β CONSUMER GROUP LEVEL β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Independent offset tracking per consumer group β β
β β β’ Max 1000 consumer groups per Event Hub β β
β β β’ $Default: Shared by all consumers β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β CAPTURE LEVEL β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Auto-archive to ADLS Gen2 / Blob Storage β β
β β β’ Time window: 1 min - 15 min β β
β β β’ Size window: 1 MB - 1 GB β β
β β β’ Format: Avro or Parquet β β
β β β’ Ordering: Capture files in offset order per partition β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Capture Configuration
{
"properties": {
"captureDescription": {
"enabled": true,
"encoding": "Parquet",
"destination": {
"properties": {
"storageAccountResourceId": "/subscriptions/xxx/resourceGroups/rg/providers/Microsoft.Storage/storageAccounts/stdatalake001",
"blobContainer": "event-hubs-capture",
"archiveNameFormat": "{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}",
"timeWindow": "00:05:00",
"sizeLimitInBytes": 104857600
}
},
"skipEmptyArchive": true
}
}
}
Partitioning Strategy
# Key-based partitioning for ordering
from azure.eventhub import EventHubProducerClient, EventData
producer = EventHubProducerClient.from_connection_string(conn_str)
# Use device_id as partition key for ordering per device
event = EventData(json.dumps({"device_id": "sensor-001", "temp": 72.5}))
options = {"partition_key": "sensor-001"}
producer.send_batch([event], **options)
# Round-robin for even distribution (no ordering guarantee)
producer.send_batch([event]) # No partition key
Throughput Monitoring
# Monitor Event Hub metrics
from azure.monitor import MonitorManagementClient
monitor_client = MonitorManagementClient(credential, subscription_id)
metrics = monitor_client.metrics.list(
resource_uri=f"/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.EventHub/namespaces/ns/eventhubs/hub",
metricnames="IncomingBytes,OutgoingBytes,IncomingMessages",
interval="PT1M"
)
for metric in metrics.value:
print(f"{metric.name}: {metric.timeseries[0].data[-1].average}")
βΉοΈ
Pro Tip: Monitor TU/CU utilization. If consistently above 80%, scale up. Use partition keys that evenly distribute events to avoid hot partitions.
Interview Questions
Q1: How do you scale Event Hubs for higher throughput? A: 1) Increase TU/CU, 2) Add partitions, 3) Use partition keys evenly, 4) Scale consumers, 5) Use Premium tier for dedicated resources. Monitor utilization to determine which approach is needed.
Q2: What is the difference between consumer group and partition? A: Partition is a physical ordering unit within an Event Hub. Consumer group is a logical offset tracking unit. Multiple consumer groups can read the same partition independently.
Q3: How do you replay events from a specific point in time? A: Reset consumer group offset to the desired point. Use sequence number or timestamp. For Capture files, read from the appropriate file in ADLS.