Azure Cosmos DB: Global Distribution, Consistency & HTAP
Multi-model database with global distribution, turnkey replication, and guaranteed single-digit millisecond latency
Cosmos DB Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AZURE COSMOS DB ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β GLOBAL DISTRIBUTION β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Region: East US Region: West Europe Region: SE A β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββ β β
β β β ββββββββββββ β β ββββββββββββ β β ββββββββ β β β
β β β β Replica ββββββββββΌββ€ Replica ββββββββββΌββ€Replicβ β β β
β β β β (RW) β β β β (RO) β β β β(RO) β β β β
β β β ββββββββββββ β β ββββββββββββ β β ββββββββ β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββ β β
β β β β
β β Conflict Resolution: β β
β β β’ Last Writer Wins (LWW) β β
β β β’ Custom (Stored Procedure) β β
β β β’ Custom (Merge Procedure) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β PARTITIONING β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Partition Key: /deviceId β β
β β β β
β β Logical Partition 1 Logical Partition 2 LP 3 β β
β β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββ β β
β β β deviceId: A β β deviceId: B β β deviceId:β β β
β β β βββββββββββββββββ β βββββββββββββββββ β C β β β
β β β βDoc 1 ββDoc 2 ββ β βDoc 3 ββDoc 4 ββ β βββββββββ β β
β β β βββββββββββββββββ β βββββββββββββββββ β βDoc 5 ββ β β
β β ββββββββββββββββββββ ββββββββββββββββββββ β βββββββββ β β
β β ββββββββββββ β β
β β Max Physical Partition: 10 GB β β
β β Auto-partitioning (split/merge) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β CONSISTENCY LEVELS: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Strong ββββ> Bounded Staleness ββββ> Session ββββ> Eventualβ β
β β (Highest) (Default) (Lowest) β β
β β β β
β β Consistency βββββββββββββββββββββββββββββββββββββββββΊ Perf β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Consistency Levels
| Level | Guarantees | Latency | Throughput | Use Case |
|---|---|---|---|---|
| Strong | Linearizable reads | Highest | Lowest | Financial transactions |
| Bounded Staleness | Reads lag by K versions or T seconds | High | Low | Order tracking |
| Session | Read your writes | Medium | Medium | User profiles (default) |
| Consistent Prefix | No out-of-order reads | Medium | Medium | Social media feeds |
| Eventual | Eventually consistent | Lowest | Highest | Counters, non-critical |
Change Feed for Data Engineering
# Change Feed processor for real-time data sync
from azure.cosmos import CosmosClient, PartitionKey
from azure.cosmos.change_feed.change_feed_processor import ChangeFeedProcessor
import json
client = CosmosClient(
"https://cosmos-prod.documents.azure.com:443/",
{"masterKey": "your-key"}
)
database = client.get_database_client("telemetry")
container = database.get_container_client("events")
# Read change feed
import time
from datetime import datetime
start_time = datetime.utcnow()
continuation_token = None
while True:
response = container.query_items_change_feed(
start_time=start_time,
continuation_token=continuation_token
)
for event in response:
print(f"New/Modified: {event['id']}")
print(f"Data: {json.dumps(event)}")
# Process event (e.g., write to ADLS)
process_to_data_lake(event)
if response.continuation_token:
continuation_token = response.continuation_token
start_time = datetime.utcnow()
time.sleep(5) # Poll interval
def process_to_data_lake(event):
from azure.storage.filedatalake import DataLakeServiceClient
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
datalake = DataLakeServiceClient(
account_url="https://stdatalake001.dfs.core.windows.net",
credential=credential
)
file_client = datalake.get_file_client(
"curated",
f"events/{event['date']}/{event['id']}.json"
)
file_client.upload_data(
json.dumps(event),
overwrite=True
)
HTAP (Hybrid Transactional/Analytical Processing)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β COSMOS DB HTAP ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β TRANSACTIONAL WORKLOAD (OLTP) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β App ββ> Cosmos DB (Point Reads, Range Queries) β β
β β Latency: <10ms β β
β β Consistency: Session β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ANALYTICAL WORKLOAD (OLAP) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Analytics ββ> Cosmos DB Analytical Store (Columnar) β β
β β Latency: Seconds to minutes β β
β β Format: Parquet (columnar) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β SYNAPSE LINK (HTAP): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Cosmos DB βββββ Synapse Link ββββ> Synapse Analytics β β
β β β β
β β β’ Real-time analytics on operational data β β
β β β’ No ETL pipeline required β β
β β β’ Consistent view of transactional data β β
β β β’ Serverless SQL pool queries analytical store β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Python SDK Example
from azure.cosmos import CosmosClient, PartitionKey, exceptions
client = CosmosClient(
"https://cosmos-prod.documents.azure.com:443/",
DefaultAzureCredential()
)
database = client.create_database_if_not_exists(
id="analytics",
offer_throughput=4000
)
container = database.create_container(
id="events",
partition_key=PartitionKey(path="/deviceId"),
default_ttl=2592000, # 30 days
offer_throughput=1000
)
# Upsert with TTL
container.upsert_item({
"id": "event-001",
"deviceId": "sensor-123",
"temperature": 72.5,
"timestamp": "2024-01-15T10:30:00Z",
"ttl": 86400 # 24 hours (override default)
})
# Query with partition key (efficient)
items = container.query_items(
query="SELECT * FROM c WHERE c.deviceId = @deviceId",
parameters=[
{"name": "@deviceId", "value": "sensor-123"}
],
enable_cross_partition_query=False
)
# Cross-partition query (expensive)
all_items = container.query_items(
query="SELECT * FROM c WHERE c.temperature > 70",
enable_cross_partition_query=True
)
β οΈ
Important: Cross-partition queries are expensive and slow. Always include the partition key in queries when possible. Use composite indexes for frequently queried non-partition-key fields.
Interview Questions
Q1: Explain the trade-off between consistency levels in Cosmos DB. A: Strong consistency provides the highest consistency but lowest throughput and highest latency. Eventual consistency provides the lowest latency and highest throughput but eventual consistency. Session consistency (default) provides read-your-writes guarantee within a session.
Q2: How do you choose a partition key for Cosmos DB? A: Choose a key with high cardinality (many distinct values) and even distribution. Avoid keys that create hot partitions (e.g., timestamps). Example: /deviceId for IoT data, /userId for user profiles.
Q3: What is the cost model for Cosmos DB? A: Cosmos DB charges for: 1) Request Units (RU/s) per second, 2) Storage (GB/month), 3) Network egress. RU consumption depends on operation type, item size, and index properties. Use Azure Cosmos DB Capacity Calculator to estimate costs.