Serverless Architecture Patterns

Difficulty: Senior Level | Companies: AWS, Google, Microsoft, Netflix, Uber

Serverless Mental Model

Serverless isn't "no servers" — it's abstracting server management away. You pay per invocation, scale automatically, and focus on business logic.

ℹ️

Serverless is ideal for event-driven, spiky, or asynchronous workloads. It's less suitable for long-running, stateful, or steady-state workloads.

Pattern 1: Fan-Out/Fan-In with Step Functions

Process items in parallel and aggregate results.

// Step Functions state machine definition
const parallelProcessingStateMachine = {
  Comment: 'Fan-out/fan-in for parallel data processing',
  StartAt: 'ReceiveBatch',
  States: {
    ReceiveBatch: {
      Type: 'Task',
      Resource: 'arn:aws:lambda:us-east-1:123456789:function:receive-batch',
      Next: 'DistributeWork',
    },
    DistributeWork: {
      Type: 'Map',
      ItemsPath: '$.items',
      MaxConcurrency: 100,
      Iterator: {
        StartAt: 'ProcessItem',
        States: {
          ProcessItem: {
            Type: 'Task',
            Resource: 'arn:aws:lambda:us-east-1:123456789:function:process-item',
            Retry: [
              {
                ErrorEquals: ['States.TaskFailed'],
                IntervalSeconds: 2,
                MaxAttempts: 3,
                BackoffRate: 2,
              },
            ],
            End: true,
          },
        },
      },
      Next: 'AggregateResults',
    },
    AggregateResults: {
      Type: 'Task',
      Resource: 'arn:aws:lambda:us-east-1:123456789:function:aggregate',
      Next: 'SendNotification',
    },
    SendNotification: {
      Type: 'Task',
      Resource: 'arn:aws:sns:us-east-1:123456789:processing-complete',
      End: true,
    },
  },
};

Pattern 2: Lambda with Provisioned Concurrency

Avoid cold starts for latency-sensitive applications.

# SAM template for provisioned concurrency
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  ApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs20.x
      CodeUri: src/
      MemorySize: 1024
      Timeout: 10
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 10
      AutoPublishAlias: live
      DeploymentPreference:
        Type: Canary10Percent5Minutes
        Alarms:
          - !Ref FunctionErrorAlarm
      Environment:
        Variables:
          NODE_ENV: production
      Events:
        Api:
          Type: Api
          Properties:
            Path: /{proxy+}
            Method: ANY

  # Auto-scaling for provisioned concurrency
  ScalableTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    Properties:
      MaxCapacity: 100
      MinCapacity: 10
      ResourceId: !Sub function:${ApiFunction}:live
      ScalableDimension: lambda:function:ProvisionedConcurrency
      ServiceNamespace: lambda
      ScheduledActions:
        - ScheduledActionName: ScaleUpMorning
          Schedule: "cron(0 8 ? * MON-FRI *)"
          ScalableTargetAction:
            MinCapacity: 50
        - ScheduledActionName: ScaleDownEvening
          Schedule: "cron(0 20 ? * MON-FRI *)"
          ScalableTargetAction:
            MinCapacity: 10

⚠️

Provisioned concurrency costs ~40% more than on-demand. Use it only for latency-critical paths and scale automatically based on time or metrics.

Pattern 3: Lambda Layers for Shared Dependencies

Share common code across multiple Lambda functions.

# requirements-layer/
# requirements.txt
boto3==1.28.0
requests==2.31.0
pydantic==2.0.0

# Build the layer
# pip install -r requirements.txt -t python/lib/python3.11/site-packages/
# zip -r layer.zip python/

# Lambda function using the layer
import sys
sys.path.insert(0, '/opt/python/lib/python3.11/site-packages')

from pydantic import BaseModel
from typing import List
import boto3
import requests

class OrderItem(BaseModel):
    product_id: str
    quantity: int
    price: float

class OrderRequest(BaseModel):
    customer_id: str
    items: List[OrderItem]
    total: float

def handler(event, context):
    order = OrderRequest(**event)
    
    # Validate with Pydantic
    if order.total != sum(i.quantity * i.price for i in order.items):
        raise ValueError("Total mismatch")
    
    # External API call
    response = requests.post(
        'https://api.example.com/orders',
        json=order.dict(),
        timeout=5
    )
    
    return {
        'statusCode': response.status_code,
        'body': response.json()
    }

Pattern 4: Async Processing with SQS + Lambda

Decouple producers from consumers with SQS.

// SQS queue configuration with DLQ
const sqsConfig = {
  QueueName: 'order-processing-queue',
  DelaySeconds: 0,
  MessageRetentionPeriod: 1209600, // 14 days
  VisibilityTimeout: 300, // 5 minutes
  RedrivePolicy: {
    deadLetterTargetArn: 'arn:aws:sqs:us-east-1:123456789:dlq-order-processing',
    maxReceiveCount: 3,
  },
  Tags: [
    { Key: 'Environment', Value: 'production' },
    { Key: 'Service', Value: 'order-processing' },
  ],
};

// Lambda handler with batch processing
exports.handler = async (event) => {
  const batchItemFailures = [];
  
  for (const record of event.Records) {
    try {
      const order = JSON.parse(record.body);
      await processOrder(order);
    } catch (error) {
      console.error(`Failed to process ${record.messageId}:`, error);
      batchItemFailures.push({
        itemIdentifier: record.messageId,
      });
    }
  }
  
  // Return partial batch failure support
  return { batchItemFailures };
};

async function processOrder(order) {
  // Simulate processing
  await db.orders.update({
    where: { id: order.id },
    data: { status: 'processed', processedAt: new Date() },
  });
}

Pattern 5: Lambda Cold Start Optimization

Reduce cold start times with these techniques.

# Optimized Lambda for fast cold starts

# 1. Global scope initialization (reused across invocations)
import boto3
import json

# Initialize outside handler - reused across warm starts
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('orders')

# 2. Connection reuse
import requests
session = requests.Session()  # Reuse connection pool

def handler(event, context):
    # 3. Lazy import for rarely-used modules
    if event.get('needs_ml'):
        import ml_model  # Only import when needed
        result = ml_model.predict(event['data'])
    
    # 4. Keep payload small
    order_id = event['pathParameters']['id']
    
    response = table.get_item(
        Key={'id': order_id},
        ProjectionExpression='id, status, total'
    )
    
    return {
        'statusCode': 200,
        'headers': {
            'Content-Type': 'application/json',
            'Cache-Control': 'public, max-age=60',
        },
        'body': json.dumps(response.get('Item', {})),
    }

ℹ️

Cold starts add 100ms-2s latency depending on runtime. Java and .NET have the longest cold starts; Python and Node.js are fastest.

Cost Comparison

Pattern	Monthly Cost (1M requests)	Cold Start	Best For
On-demand Lambda	$0.20 + compute	100-500ms	Spiky workloads
Provisioned Concurrency	$0.20 +$ 15 provisioned	<10ms	Latency-sensitive
Lambda + SQS	$0.20 +$ 0.40/1M msgs	100-500ms	Async processing
Step Functions	$0.025/1K transitions	N/A	Complex workflows

Follow-Up Questions

How do you handle state management in a serverless workflow that requires human approval steps?
What are the trade-offs between Lambda, Fargate, and EC2 for a batch processing workload?
How would you implement circuit breaker patterns in a serverless microservices architecture?