๐ŸŽ‰ 75% of content is free forever โ€” Unlock Premium from $10/mo โ†’
CW
Search coursesโ€ฆ
๐Ÿ’ผ Servicesโ„น๏ธ Aboutโœ‰๏ธ ContactView Pricing Plansfrom $10

Saga Pattern (Distributed Transactions)

Cloud ArchitectureTransaction Managementโญ Premium

Advertisement

Saga Pattern (Distributed Transactions)

Difficulty: Senior Level | Companies: AWS, Google, Microsoft, Netflix, Uber

Distributed Transaction Problem

Microservices can't use traditional ACID transactions across service boundaries. Sagas coordinate multi-step processes with compensating actions for rollback.

โ„น๏ธ

A saga is a sequence of local transactions. If one fails, compensating transactions undo the previous steps.

Saga Flow

Architecture Diagram
Normal Flow:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Create  โ”‚โ”€โ”€โ”€โ–ถโ”‚  Reserve โ”‚โ”€โ”€โ”€โ–ถโ”‚ Process  โ”‚โ”€โ”€โ”€โ–ถโ”‚ Complete โ”‚
โ”‚  Order   โ”‚    โ”‚ Inventoryโ”‚    โ”‚ Payment  โ”‚    โ”‚ Order    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                      โ”‚
Failure at Payment:                                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Create  โ”‚โ”€โ”€โ”€โ–ถโ”‚  Reserve โ”‚โ”€โ”€โ”€โ–ถโ”‚ Process  โ”‚โ”€โ”€โ”€โ–ถโ”‚ Compensateโ”‚
โ”‚  Order   โ”‚    โ”‚ Inventoryโ”‚    โ”‚ Payment  โ”‚    โ”‚ (Cancel) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚              โ”‚               โ”‚
                       โ–ผ              โ–ผ               โ–ผ
                  Release Stock  Refund Payment  Cancel Order

Pattern 1: Choreography-Based Saga

Each service publishes events that trigger the next step.

// Order Service - publishes OrderCreated
export class OrderService {
  async createOrder(input: CreateOrderInput): Promise<Order> {
    const order = await this.db.orders.create({
      data: {
        customerId: input.customerId,
        items: input.items,
        status: 'PENDING',
      },
    });

    // Publish event - triggers next step
    await this.eventBus.publish({
      source: 'order-service',
      detailType: 'OrderCreated',
      detail: {
        orderId: order.id,
        customerId: input.customerId,
        items: input.items,
      },
    });

    return order;
  }

  // Compensating action - triggered by OrderFailed
  async cancelOrder(orderId: string): Promise<void> {
    await this.db.orders.update({
      where: { id: orderId },
      data: { status: 'CANCELLED' },
    });
  }
}

// Inventory Service - listens for OrderCreated
export class InventoryService {
  async handleOrderCreated(event: any): Promise<void> {
    const { orderId, items } = event.detail;

    try {
      // Reserve inventory
      const reservations = await Promise.all(
        items.map(item =>
          this.db.inventory.update({
            where: { productId: item.productId },
            data: { reserved: { increment: item.quantity } },
          })
        )
      );

      // Publish success event
      await this.eventBus.publish({
        source: 'inventory-service',
        detailType: 'InventoryReserved',
        detail: { orderId, reservations },
      });
    } catch (error) {
      // Publish failure - triggers compensation
      await this.eventBus.publish({
        source: 'inventory-service',
        detailType: 'InventoryReservationFailed',
        detail: { orderId, reason: error.message },
      });
    }
  }

  async handleOrderCancelled(event: any): Promise<void> {
    const { orderId } = event.detail;
    // Release reserved inventory
    await this.releaseInventory(orderId);
  }
}

โš ๏ธ

Choreography becomes hard to manage after 4-5 services. The flow is distributed and difficult to trace. Consider orchestration for complex sagas.

Pattern 2: Orchestration-Based Saga

A central orchestrator coordinates the entire saga.

// Saga orchestrator
export class OrderSagaOrchestrator {
  constructor(
    private eventBus: EventBridge,
    private stepFunctions: SFN,
  ) {}

  async execute(orderData: OrderData): Promise<SagaResult> {
    const sagaId = uuidv4();
    
    // Define saga steps
    const steps: SagaStep[] = [
      {
        name: 'CreateOrder',
        action: this.createOrder.bind(this),
        compensate: this.cancelOrder.bind(this),
      },
      {
        name: 'ReserveInventory',
        action: this.reserveInventory.bind(this),
        compensate: this.releaseInventory.bind(this),
      },
      {
        name: 'ProcessPayment',
        action: this.processPayment.bind(this),
        compensate: this.refundPayment.bind(this),
      },
      {
        name: 'ConfirmOrder',
        action: this.confirmOrder.bind(this),
        compensate: this.cancelOrder.bind(this),
      },
    ];

    // Execute saga
    const completedSteps: SagaStep[] = [];
    
    for (const step of steps) {
      try {
        const result = await step.action(orderData, sagaId);
        completedSteps.push(step);
        
        // Publish progress event
        await this.eventBus.publish({
          source: 'order-saga',
          detailType: 'SagaStepCompleted',
          detail: { sagaId, step: step.name, result },
        });
      } catch (error) {
        // Compensation: undo completed steps in reverse
        await this.compensate(completedSteps.reverse(), sagaId, error);
        throw new SagaFailedError(sagaId, step.name, error);
      }
    }

    return { sagaId, status: 'COMPLETED' };
  }

  private async compensate(
    steps: SagaStep[],
    sagaId: string,
    error: Error,
  ): Promise<void> {
    for (const step of steps) {
      try {
        await step.compensate(sagaId);
        await this.eventBus.publish({
          source: 'order-saga',
          detailType: 'SagaCompensationCompleted',
          detail: { sagaId, step: step.name },
        });
      } catch (compensationError) {
        // Log and alert - manual intervention may be needed
        console.error(
          `Compensation failed for ${step.name}:`,
          compensationError
        );
        await this.alertOpsTeam(sagaId, step.name, compensationError);
      }
    }
  }
}

Pattern 3: Step Functions Saga

Use AWS Step Functions for saga orchestration.

{
  "Comment": "Order Processing Saga",
  "StartAt": "CreateOrder",
  "States": {
    "CreateOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:create-order",
      "ResultPath": "$.orderResult",
      "Retry": [
        {
          "ErrorEquals": ["States.TaskFailed"],
          "MaxAttempts": 2,
          "BackoffRate": 2
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "Next": "SagaFailed",
          "ResultPath": "$.error"
        }
      ],
      "Next": "ReserveInventory"
    },
    "ReserveInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:reserve-inventory",
      "ResultPath": "$.inventoryResult",
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "Next": "CancelOrder",
          "ResultPath": "$.error"
        }
      ],
      "Next": "ProcessPayment"
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:process-payment",
      "ResultPath": "$.paymentResult",
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "Next": "ReleaseInventory",
          "ResultPath": "$.error"
        }
      ],
      "Next": "ConfirmOrder"
    },
    "ConfirmOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:confirm-order",
      "End": true
    },
    "ReleaseInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:release-inventory",
      "Next": "CancelOrder"
    },
    "CancelOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:cancel-order",
      "Next": "SagaFailed"
    },
    "SagaFailed": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:handle-saga-failure",
      "End": true
    }
  }
}

โ„น๏ธ

Step Functions provide visual saga flow, automatic retries, and built-in error handling. Ideal for complex sagas with many steps.

Pattern 4: Saga State Machine with DynamoDB

Persist saga state for durability and replay.

# Saga state persistence
import boto3
import json
from enum import Enum
from datetime import datetime

class SagaStatus(Enum):
    STARTED = "STARTED"
    IN_PROGRESS = "IN_PROGRESS"
    COMPENSATING = "COMPENSATING"
    COMPLETED = "COMPLETED"
    FAILED = "FAILED"

class SagaStateManager:
    def __init__(self):
        self.dynamodb = boto3.resource('dynamodb')
        self.table = self.dynamodb.Table('saga-state')
    
    async def create_saga(self, saga_id: str, saga_type: str, input_data: dict):
        await self.table.put_item(Item={
            'sagaId': saga_id,
            'sagaType': saga_type,
            'status': SagaStatus.STARTED.value,
            'inputData': input_data,
            'currentStep': 0,
            'completedSteps': [],
            'createdAt': datetime.utcnow().isoformat(),
            'updatedAt': datetime.utcnow().isoformat(),
        })
    
    async def update_step(self, saga_id: str, step_name: str, status: str, result: dict):
        await self.table.update_item(
            Key={'sagaId': saga_id},
            UpdateExpression="""
                SET currentStep = currentStep + 1,
                    completedSteps = list_append(completedSteps, :step),
                    #status = :status,
                    updatedAt = :now
            """,
            ExpressionAttributeNames={'#status': 'status'},
            ExpressionAttributeValues={
                ':step': [{'name': step_name, 'status': status, 'result': result}],
                ':status': status,
                ':now': datetime.utcnow().isoformat(),
            },
        )
    
    async def get_saga(self, saga_id: str):
        response = await self.table.get_item(Key={'sagaId': saga_id})
        return response.get('Item')
    
    async def mark_compensating(self, saga_id: str):
        await self.table.update_item(
            Key={'sagaId': saga_id},
            UpdateExpression="SET #status = :status, updatedAt = :now",
            ExpressionAttributeNames={'#status': 'status'},
            ExpressionAttributeValues={
                ':status': SagaStatus.COMPENSATING.value,
                ':now': datetime.utcnow().isoformat(),
            },
        )

Saga Compensation Examples

// Compensation functions for each step
export class OrderCompensation {
  async cancelOrder(orderId: string): Promise<void> {
    await db.orders.update({
      where: { id: orderId },
      data: { status: 'CANCELLED', cancelledAt: new Date() },
    });
  }

  async releaseInventory(orderId: string): Promise<void> {
    const order = await db.orders.findUnique({ where: { id: orderId } });
    for (const item of order.items) {
      await db.inventory.update({
        where: { productId: item.productId },
        data: { reserved: { decrement: item.quantity } },
      });
    }
  }

  async refundPayment(paymentId: string): Promise<void> {
    await paymentGateway.refund({
      paymentId,
      reason: 'saga_compensation',
    });
  }

  async sendCompensationNotification(orderId: string, reason: string): Promise<void> {
    await notificationService.send({
      userId: await this.getOrderUserId(orderId),
      template: 'order-cancelled',
      data: { orderId, reason },
    });
  }
}

โš ๏ธ

Compensating transactions must be idempotent. Network failures may cause them to be called multiple times.

Saga Pattern Comparison

AspectChoreographyOrchestration
CouplingLowMedium
ComplexityHigh (distributed)Low (centralized)
VisibilityHard to traceEasy to monitor
FlexibilityHighMedium
Best forSimple, 3-4 servicesComplex, 5+ services

Follow-Up Questions

  1. How do you handle timeouts in a saga when a service is temporarily unavailable?
  2. What strategies would you use to test a saga with compensating transactions?
  3. How do you handle partial compensation when some compensating actions fail?

Advertisement