Saga Pattern (Distributed Transactions)
Difficulty: Senior Level | Companies: AWS, Google, Microsoft, Netflix, Uber
Distributed Transaction Problem
Microservices can't use traditional ACID transactions across service boundaries. Sagas coordinate multi-step processes with compensating actions for rollback.
โน๏ธ
A saga is a sequence of local transactions. If one fails, compensating transactions undo the previous steps.
Saga Flow
Normal Flow:
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ Create โโโโโถโ Reserve โโโโโถโ Process โโโโโถโ Complete โ
โ Order โ โ Inventoryโ โ Payment โ โ Order โ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ
Failure at Payment: โผ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ Create โโโโโถโ Reserve โโโโโถโ Process โโโโโถโ Compensateโ
โ Order โ โ Inventoryโ โ Payment โ โ (Cancel) โ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
Release Stock Refund Payment Cancel Order
Pattern 1: Choreography-Based Saga
Each service publishes events that trigger the next step.
// Order Service - publishes OrderCreated
export class OrderService {
async createOrder(input: CreateOrderInput): Promise<Order> {
const order = await this.db.orders.create({
data: {
customerId: input.customerId,
items: input.items,
status: 'PENDING',
},
});
// Publish event - triggers next step
await this.eventBus.publish({
source: 'order-service',
detailType: 'OrderCreated',
detail: {
orderId: order.id,
customerId: input.customerId,
items: input.items,
},
});
return order;
}
// Compensating action - triggered by OrderFailed
async cancelOrder(orderId: string): Promise<void> {
await this.db.orders.update({
where: { id: orderId },
data: { status: 'CANCELLED' },
});
}
}
// Inventory Service - listens for OrderCreated
export class InventoryService {
async handleOrderCreated(event: any): Promise<void> {
const { orderId, items } = event.detail;
try {
// Reserve inventory
const reservations = await Promise.all(
items.map(item =>
this.db.inventory.update({
where: { productId: item.productId },
data: { reserved: { increment: item.quantity } },
})
)
);
// Publish success event
await this.eventBus.publish({
source: 'inventory-service',
detailType: 'InventoryReserved',
detail: { orderId, reservations },
});
} catch (error) {
// Publish failure - triggers compensation
await this.eventBus.publish({
source: 'inventory-service',
detailType: 'InventoryReservationFailed',
detail: { orderId, reason: error.message },
});
}
}
async handleOrderCancelled(event: any): Promise<void> {
const { orderId } = event.detail;
// Release reserved inventory
await this.releaseInventory(orderId);
}
}
โ ๏ธ
Choreography becomes hard to manage after 4-5 services. The flow is distributed and difficult to trace. Consider orchestration for complex sagas.
Pattern 2: Orchestration-Based Saga
A central orchestrator coordinates the entire saga.
// Saga orchestrator
export class OrderSagaOrchestrator {
constructor(
private eventBus: EventBridge,
private stepFunctions: SFN,
) {}
async execute(orderData: OrderData): Promise<SagaResult> {
const sagaId = uuidv4();
// Define saga steps
const steps: SagaStep[] = [
{
name: 'CreateOrder',
action: this.createOrder.bind(this),
compensate: this.cancelOrder.bind(this),
},
{
name: 'ReserveInventory',
action: this.reserveInventory.bind(this),
compensate: this.releaseInventory.bind(this),
},
{
name: 'ProcessPayment',
action: this.processPayment.bind(this),
compensate: this.refundPayment.bind(this),
},
{
name: 'ConfirmOrder',
action: this.confirmOrder.bind(this),
compensate: this.cancelOrder.bind(this),
},
];
// Execute saga
const completedSteps: SagaStep[] = [];
for (const step of steps) {
try {
const result = await step.action(orderData, sagaId);
completedSteps.push(step);
// Publish progress event
await this.eventBus.publish({
source: 'order-saga',
detailType: 'SagaStepCompleted',
detail: { sagaId, step: step.name, result },
});
} catch (error) {
// Compensation: undo completed steps in reverse
await this.compensate(completedSteps.reverse(), sagaId, error);
throw new SagaFailedError(sagaId, step.name, error);
}
}
return { sagaId, status: 'COMPLETED' };
}
private async compensate(
steps: SagaStep[],
sagaId: string,
error: Error,
): Promise<void> {
for (const step of steps) {
try {
await step.compensate(sagaId);
await this.eventBus.publish({
source: 'order-saga',
detailType: 'SagaCompensationCompleted',
detail: { sagaId, step: step.name },
});
} catch (compensationError) {
// Log and alert - manual intervention may be needed
console.error(
`Compensation failed for ${step.name}:`,
compensationError
);
await this.alertOpsTeam(sagaId, step.name, compensationError);
}
}
}
}
Pattern 3: Step Functions Saga
Use AWS Step Functions for saga orchestration.
{
"Comment": "Order Processing Saga",
"StartAt": "CreateOrder",
"States": {
"CreateOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:create-order",
"ResultPath": "$.orderResult",
"Retry": [
{
"ErrorEquals": ["States.TaskFailed"],
"MaxAttempts": 2,
"BackoffRate": 2
}
],
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "SagaFailed",
"ResultPath": "$.error"
}
],
"Next": "ReserveInventory"
},
"ReserveInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:reserve-inventory",
"ResultPath": "$.inventoryResult",
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "CancelOrder",
"ResultPath": "$.error"
}
],
"Next": "ProcessPayment"
},
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:process-payment",
"ResultPath": "$.paymentResult",
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "ReleaseInventory",
"ResultPath": "$.error"
}
],
"Next": "ConfirmOrder"
},
"ConfirmOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:confirm-order",
"End": true
},
"ReleaseInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:release-inventory",
"Next": "CancelOrder"
},
"CancelOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:cancel-order",
"Next": "SagaFailed"
},
"SagaFailed": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:handle-saga-failure",
"End": true
}
}
}
โน๏ธ
Step Functions provide visual saga flow, automatic retries, and built-in error handling. Ideal for complex sagas with many steps.
Pattern 4: Saga State Machine with DynamoDB
Persist saga state for durability and replay.
# Saga state persistence
import boto3
import json
from enum import Enum
from datetime import datetime
class SagaStatus(Enum):
STARTED = "STARTED"
IN_PROGRESS = "IN_PROGRESS"
COMPENSATING = "COMPENSATING"
COMPLETED = "COMPLETED"
FAILED = "FAILED"
class SagaStateManager:
def __init__(self):
self.dynamodb = boto3.resource('dynamodb')
self.table = self.dynamodb.Table('saga-state')
async def create_saga(self, saga_id: str, saga_type: str, input_data: dict):
await self.table.put_item(Item={
'sagaId': saga_id,
'sagaType': saga_type,
'status': SagaStatus.STARTED.value,
'inputData': input_data,
'currentStep': 0,
'completedSteps': [],
'createdAt': datetime.utcnow().isoformat(),
'updatedAt': datetime.utcnow().isoformat(),
})
async def update_step(self, saga_id: str, step_name: str, status: str, result: dict):
await self.table.update_item(
Key={'sagaId': saga_id},
UpdateExpression="""
SET currentStep = currentStep + 1,
completedSteps = list_append(completedSteps, :step),
#status = :status,
updatedAt = :now
""",
ExpressionAttributeNames={'#status': 'status'},
ExpressionAttributeValues={
':step': [{'name': step_name, 'status': status, 'result': result}],
':status': status,
':now': datetime.utcnow().isoformat(),
},
)
async def get_saga(self, saga_id: str):
response = await self.table.get_item(Key={'sagaId': saga_id})
return response.get('Item')
async def mark_compensating(self, saga_id: str):
await self.table.update_item(
Key={'sagaId': saga_id},
UpdateExpression="SET #status = :status, updatedAt = :now",
ExpressionAttributeNames={'#status': 'status'},
ExpressionAttributeValues={
':status': SagaStatus.COMPENSATING.value,
':now': datetime.utcnow().isoformat(),
},
)
Saga Compensation Examples
// Compensation functions for each step
export class OrderCompensation {
async cancelOrder(orderId: string): Promise<void> {
await db.orders.update({
where: { id: orderId },
data: { status: 'CANCELLED', cancelledAt: new Date() },
});
}
async releaseInventory(orderId: string): Promise<void> {
const order = await db.orders.findUnique({ where: { id: orderId } });
for (const item of order.items) {
await db.inventory.update({
where: { productId: item.productId },
data: { reserved: { decrement: item.quantity } },
});
}
}
async refundPayment(paymentId: string): Promise<void> {
await paymentGateway.refund({
paymentId,
reason: 'saga_compensation',
});
}
async sendCompensationNotification(orderId: string, reason: string): Promise<void> {
await notificationService.send({
userId: await this.getOrderUserId(orderId),
template: 'order-cancelled',
data: { orderId, reason },
});
}
}
โ ๏ธ
Compensating transactions must be idempotent. Network failures may cause them to be called multiple times.
Saga Pattern Comparison
| Aspect | Choreography | Orchestration |
|---|---|---|
| Coupling | Low | Medium |
| Complexity | High (distributed) | Low (centralized) |
| Visibility | Hard to trace | Easy to monitor |
| Flexibility | High | Medium |
| Best for | Simple, 3-4 services | Complex, 5+ services |
Follow-Up Questions
- How do you handle timeouts in a saga when a service is temporarily unavailable?
- What strategies would you use to test a saga with compensating transactions?
- How do you handle partial compensation when some compensating actions fail?