๐ŸŽ‰ 75% of content is free forever โ€” Unlock Premium from $10/mo โ†’
CW
Search coursesโ€ฆ
๐Ÿ’ผ Servicesโ„น๏ธ Aboutโœ‰๏ธ ContactView Pricing Plansfrom $10

Multi-Region Active-Active Deployment Patterns

Cloud ArchitectureMulti-Region Architectureโญ Premium

Advertisement

Multi-Region Active-Active Deployment Patterns

Difficulty: Principal/Staff Level | Companies: Netflix, Amazon, Google, Cloudflare, Akamai

Interview Question

"Design a multi-region active-active deployment for a global e-commerce platform serving 500M+ users. How do you handle data consistency, conflict resolution, and failover?"

โ„น๏ธKey Concepts

This question tests your understanding of global distributed systems, consistency models, and disaster recovery at planetary scale.

Complete Multi-Region Architecture

Global Architecture Overview

Architecture Diagram
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    MULTI-REGION ACTIVE-ACTIVE ARCHITECTURE               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ GLOBAL LAYER โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”‚
โ”‚  โ”‚                                                                โ”‚      โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”‚      โ”‚
โ”‚  โ”‚  โ”‚   Route53   โ”‚  โ”‚  CloudFront โ”‚  โ”‚   Global    โ”‚          โ”‚      โ”‚
โ”‚  โ”‚  โ”‚  (DNS)      โ”‚  โ”‚  (CDN)      โ”‚  โ”‚ Accelerator โ”‚          โ”‚      โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ”‚      โ”‚
โ”‚  โ”‚         โ”‚                โ”‚                โ”‚                   โ”‚      โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚
โ”‚            โ”‚                โ”‚                โ”‚                           โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”‚
โ”‚  โ”‚                    REGION ROUTING                              โ”‚      โ”‚
โ”‚  โ”‚                                                                โ”‚      โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚      โ”‚
โ”‚  โ”‚  โ”‚   US-EAST-1     โ”‚      โ”‚   EU-WEST-1     โ”‚                โ”‚      โ”‚
โ”‚  โ”‚  โ”‚   (Primary)     โ”‚      โ”‚   (Secondary)   โ”‚                โ”‚      โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ”‚      โ”‚
โ”‚  โ”‚           โ”‚                        โ”‚                          โ”‚      โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚      โ”‚
โ”‚  โ”‚  โ”‚   AP-SOUTH-1    โ”‚      โ”‚   AP-EAST-1     โ”‚                โ”‚      โ”‚
โ”‚  โ”‚  โ”‚   (Tertiary)    โ”‚      โ”‚   (Quaternary)  โ”‚                โ”‚      โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ”‚      โ”‚
โ”‚  โ”‚                                                                โ”‚      โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚
โ”‚                                                                          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ DATA REPLICATION โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚  โ”‚  DynamoDB Global Tables โ”‚ Aurora Global Database             โ”‚       โ”‚
โ”‚  โ”‚  Redis Global Datastore โ”‚ S3 Cross-Region Replication        โ”‚       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚                                                                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Mathematical Foundation: Consistency Models

Consistency-Availability Trade-off (CAP Theorem):

  • Consistency (C): All nodes see the same data at the same time
  • Availability (A): Every request gets a response
  • Partition tolerance (P): System works despite network failures

For multi-region: C ร— A ร— P = 0 (must sacrifice one)

Eventual Consistency Window:

  • Cross-region replication lag: L = 100ms (typical)
  • Consistency probability: P = 1 - e^(-t/L)
  • For t = 200ms: P = 1 - e^(-200/100) = 0.865 = 86.5%

Conflict Resolution Math:

  • Last-write-wins (LWW): Simple but can lose data
  • Vector clocks: O(n) storage per update
  • CRDTs: Merge without conflict, O(1) merge time

Global Load Balancing with Route53

# Route53 health checks and failover
resource "aws_route53_health_check" "us_east_1" {
  ip_address        = var.us_east_1_endpoint
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 10

  tags = {
    Name = "us-east-1-health-check"
  }
}

resource "aws_route53_health_check" "eu_west_1" {
  ip_address        = var.eu_west_1_endpoint
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 10
}

# Latency-based routing
resource "aws_route53_record" "app_latency" {
  for_each = {
    us-east-1    = var.us_east_1_endpoint
    eu-west-1    = var.eu_west_1_endpoint
    ap-south-1   = var.ap_south_1_endpoint
  }

  zone_id = aws_route53_zone.main.zone_id
  name    = "app.example.com"
  type    = "A"

  alias {
    name                   = each.value
    zone_id               = data.aws_elb_hosted_zone_id[each.key].id
    evaluate_target_health = true
  }

  latency_routing_policy {
    region = each.key
  }

  set_identifier = each.key
}

# Failover routing
resource "aws_route53_record" "app_failover_primary" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "app.example.com"
  type    = "A"

  alias {
    name                   = var.us_east_1_endpoint
    zone_id               = data.aws_elb_hosted_zone_id["us-east-1"].id
    evaluate_target_health = true
  }

  failover_routing_policy {
    type = "PRIMARY"
  }

  set_identifier = "primary"
  health_check_id = aws_route53_health_check.us_east_1.id
}

resource "aws_route53_record" "app_failover_secondary" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "app.example.com"
  type    = "A"

  alias {
    name                   = var.eu_west_1_endpoint
    zone_id               = data.aws_elb_hosted_zone_id["eu-west-1"].id
    evaluate_target_health = true
  }

  failover_routing_policy {
    type = "SECONDARY"
  }

  set_identifier = "secondary"
  health_check_id = aws_route53_health_check.eu_west_1.id
}

# Geolocation routing for compliance
resource "aws_route53_record" "app_geo" {
  for_each = {
    "us-east-1" = {
      continent = "NA"
      endpoint  = var.us_east_1_endpoint
    }
    "eu-west-1" = {
      continent = "EU"
      endpoint  = var.eu_west_1_endpoint
    }
    "ap-south-1" = {
      continent = "AS"
      endpoint  = var.ap_south_1_endpoint
    }
  }

  zone_id = aws_route53_zone.main.zone_id
  name    = "app.example.com"
  type    = "A"

  alias {
    name                   = each.value.endpoint
    zone_id               = data.aws_elb_hosted_zone_id[each.key].id
    evaluate_target_health = true
  }

  geolocation_routing_policy {
    continent = each.value.continent
  }

  set_identifier = each.key
}

DynamoDB Global Tables

# DynamoDB Global Tables configuration
import boto3
from typing import Dict, Any, List
from datetime import datetime
import json

class GlobalTableManager:
    """Manager for DynamoDB Global Tables"""

    def __init__(self, table_name: str):
        self.dynamodb = boto3.client('dynamodb')
        self.table_name = table_name

    def create_global_table(self):
        """Create DynamoDB Global Table"""
        response = self.dynamodb.create_global_table(
            GlobalTableName=self.table_name,
            ReplicationGroup=[
                {
                    'RegionName': 'us-east-1',
                    'ReadCapacitySettings': {
                        'ReadCapacityUnits': 1000,
                        'WriteCapacityAutoScalingSettings': {
                            'MinimumUnits': 100,
                            'MaximumUnits': 10000,
                            'AutoScalingDisabled': False,
                            'TargetTrackingScalingPolicyConfiguration': {
                                'TargetValue': 70.0,
                                'PredefinedMetricSpecification': {
                                    'PredefinedMetricType': 'DynamoDBReadCapacityUtilization'
                                }
                            }
                        }
                    },
                    'WriteCapacitySettings': {
                        'WriteCapacityUnits': 1000,
                        'WriteCapacityAutoScalingSettings': {
                            'MinimumUnits': 100,
                            'MaximumUnits': 10000,
                            'AutoScalingDisabled': False,
                            'TargetTrackingScalingPolicyConfiguration': {
                                'TargetValue': 70.0,
                                'PredefinedMetricSpecification': {
                                    'PredefinedMetricType': 'DynamoDBWriteCapacityUtilization'
                                }
                            }
                        }
                    }
                },
                {
                    'RegionName': 'eu-west-1',
                    'ReadCapacitySettings': {
                        'ReadCapacityUnits': 1000,
                        'WriteCapacityAutoScalingSettings': {
                            'MinimumUnits': 100,
                            'MaximumUnits': 10000,
                            'AutoScalingDisabled': False,
                            'TargetTrackingScalingPolicyConfiguration': {
                                'TargetValue': 70.0,
                                'PredefinedMetricSpecification': {
                                    'PredefinedMetricType': 'DynamoDBReadCapacityUtilization'
                                }
                            }
                        }
                    },
                    'WriteCapacitySettings': {
                        'WriteCapacityUnits': 1000,
                        'WriteCapacityAutoScalingSettings': {
                            'MinimumUnits': 100,
                            'MaximumUnits': 10000,
                            'AutoScalingDisabled': False,
                            'TargetTrackingScalingPolicyConfiguration': {
                                'TargetValue': 70.0,
                                'PredefinedMetricSpecification': {
                                    'PredefinedMetricType': 'DynamoDBWriteCapacityUtilization'
                                }
                            }
                        }
                    }
                },
                {
                    'RegionName': 'ap-south-1',
                    'ReadCapacitySettings': {
                        'ReadCapacityUnits': 500,
                        'WriteCapacityAutoScalingSettings': {
                            'MinimumUnits': 50,
                            'MaximumUnits': 5000,
                            'AutoScalingDisabled': False,
                            'TargetTrackingScalingPolicyConfiguration': {
                                'TargetValue': 70.0,
                                'PredefinedMetricSpecification': {
                                    'PredefinedMetricType': 'DynamoDBReadCapacityUtilization'
                                }
                            }
                        }
                    },
                    'WriteCapacitySettings': {
                        'WriteCapacityUnits': 500,
                        'WriteCapacityAutoScalingSettings': {
                            'MinimumUnits': 50,
                            'MaximumUnits': 5000,
                            'AutoScalingDisabled': False,
                            'TargetTrackingScalingPolicyConfiguration': {
                                'TargetValue': 70.0,
                                'PredefinedMetricSpecification': {
                                    'PredefinedMetricType': 'DynamoDBWriteCapacityUtilization'
                                }
                            }
                        }
                    }
                }
            ],
            BillingMode='PAY_PER_REQUEST',
            StreamSpecification={
                'StreamEnabled': True,
                'StreamViewType': 'NEW_AND_OLD_IMAGES'
            },
            SSESpecification={
                'Enabled': True,
                'SSEType': 'KMS',
                'KMSMasterKeyId': 'alias/aws/dynamodb'
            }
        )
        return response

    def put_item_global(self, item: Dict[str, Any], region: str = 'us-east-1'):
        """Put item with conflict resolution"""
        dynamodb = boto3.resource('dynamodb', region_name=region)
        table = dynamodb.Table(self.table_name)

        # Add version for conflict resolution
        item['version'] = int(datetime.utcnow().timestamp() * 1000)
        item['last_updated_region'] = region

        response = table.put_item(
            Item=item,
            ConditionExpression='attribute_not_exists(PK) OR version < :version',
            ExpressionAttributeValues={
                ':version': item['version']
            }
        )
        return response

    def get_item_global(self, item_key: Dict[str, Any]) -> Dict[str, Any]:
        """Get item with consistent read"""
        # Use consistent read for most up-to-date data
        dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
        table = dynamodb.Table(self.table_name)

        response = table.get_item(
            Key=item_key,
            ConsistentRead=True
        )
        return response.get('Item')

    def query_global(self, index_name: str, key_condition: str, limit: int = 100):
        """Query global table"""
        dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
        table = dynamodb.Table(self.table_name)

        response = table.query(
            IndexName=index_name,
            KeyConditionExpression=key_condition,
            Limit=limit,
            ScanIndexForward=False
        )
        return response.get('Items', [])

# Conflict resolution using vector clocks
class VectorClock:
    """Vector clock for conflict detection"""

    def __init__(self):
        self.clock: Dict[str, int] = {}

    def increment(self, node_id: str):
        """Increment clock for node"""
        self.clock[node_id] = self.clock.get(node_id, 0) + 1

    def merge(self, other: 'VectorClock'):
        """Merge two vector clocks"""
        for node_id, timestamp in other.clock.items():
            self.clock[node_id] = max(self.clock.get(node_id, 0), timestamp)

    def happens_before(self, other: 'VectorClock') -> bool:
        """Check if this clock happens before another"""
        for node_id in self.clock:
            if node_id not in other.clock:
                return False
            if self.clock[node_id] > other.clock[node_id]:
                return False
        return True

    def concurrent_with(self, other: 'VectorClock') -> bool:
        """Check if clocks are concurrent"""
        return not self.happens_before(other) and not other.happens_before(self)

    def to_dict(self) -> Dict[str, int]:
        return self.clock.copy()

    @classmethod
    def from_dict(cls, clock_dict: Dict[str, int]) -> 'VectorClock':
        vc = cls()
        vc.clock = clock_dict.copy()
        return vc

โš ๏ธConflict Resolution

Choose your conflict resolution strategy carefully. Last-write-wins is simple but can lose data. Vector clocks add complexity but preserve causality.

Cross-Region Replication

# Cross-region data synchronization
import boto3
import json
from typing import Dict, Any, List
from datetime import datetime
import hashlib

class CrossRegionReplicator:
    """Handles cross-region data replication"""

    def __init__(self, regions: List[str]):
        self.regions = regions
        self.kinesis = boto3.client('kinesis')

    def replicate_event(self, event: Dict[str, Any], source_region: str):
        """Replicate event to all regions"""
        # Create unique event ID for deduplication
        event_id = hashlib.md5(
            json.dumps(event, sort_keys=True).encode()
        ).hexdigest()

        for region in self.regions:
            if region != source_region:
                self._send_to_region(region, event, event_id)

    def _send_to_region(self, region: str, event: Dict[str, Any], event_id: str):
        """Send event to specific region via Kinesis"""
        kinesis = boto3.client('kinesis', region_name=region)

        # Add metadata for replication
        event_with_metadata = {
            'original_event': event,
            'replication_metadata': {
                'event_id': event_id,
                'source_region': region,
                'replication_timestamp': datetime.utcnow().isoformat(),
                'replication_id': str(hashlib.md5(
                    f"{event_id}{region}{datetime.utcnow().isoformat()}".encode()
                ).hexdigest())
            }
        }

        kinesis.put_record(
            StreamName='cross-region-replication-stream',
            Data=json.dumps(event_with_metadata),
            PartitionKey=event_id
        )

class GlobalDataStore:
    """Global data store with eventual consistency"""

    def __init__(self, primary_region: str, replica_regions: List[str]):
        self.primary_region = primary_region
        self.replica_regions = replica_regions
        self.dynamodb = boto3.resource('dynamodb')

    def write(self, table_name: str, item: Dict[str, Any], region: str = None):
        """Write to primary region"""
        if region is None:
            region = self.primary_region

        table = self.dynamodb.Table(table_name, region_name=region)
        response = table.put_item(Item=item)

        # Trigger replication
        self._replicate(table_name, item, region)

        return response

    def read(self, table_name: str, key: Dict[str, Any], consistent: bool = False):
        """Read from nearest region"""
        # In production, determine nearest region based on latency
        region = self._get_nearest_region()

        table = self.dynamodb.Table(table_name, region_name=region)
        response = table.get_item(
            Key=key,
            ConsistentRead=consistent
        )
        return response.get('Item')

    def _replicate(self, table_name: str, item: Dict[str, Any], source_region: str):
        """Replicate to other regions"""
        for region in self.replica_regions:
            if region != source_region:
                self._replicate_async(table_name, item, region)

    def _replicate_async(self, table_name: str, item: Dict[str, Any], region: str):
        """Async replication to region"""
        # In production, use async processing
        table = self.dynamodb.Table(table_name, region_name=region)
        table.put_item(Item=item)

    def _get_nearest_region(self) -> str:
        """Get nearest region based on latency"""
        # Simplified - in production, use latency measurements
        return self.primary_region

Failover Automation

# CloudWatch alarms for failover
resource "aws_cloudwatch_metric_alarm" "us_east_1_5xx" {
  alarm_name          = "us-east-1-5xx-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "HTTPCode_Target_5XX_Count"
  namespace           = "AWS/ApplicationELB"
  period              = 60
  statistic           = "Sum"
  threshold           = 100
  alarm_description   = "5xx errors in us-east-1"

  dimensions = {
    LoadBalancer = aws_lb.us_east_1.arn_suffix
  }

  alarm_actions = [aws_sns_topic.failover.arn]
  ok_actions    = [aws_sns_topic.failover_recovery.arn]
}

# Lambda function for automatic failover
resource "aws_lambda_function" "failover_handler" {
  filename         = "lambda/failover.zip"
  function_name    = "failover-handler"
  role            = aws_iam_role.failover_lambda.arn
  handler         = "index.handler"
  runtime         = "python3.9"
  timeout         = 30
  memory_size     = 128

  environment {
    variables = {
      PRIMARY_REGION   = "us-east-1"
      SECONDARY_REGION = "eu-west-1"
      ROUTE53_ZONE_ID  = aws_route53_zone.main.zone_id
    }
  }
}

# SNS topic for failover notifications
resource "aws_sns_topic" "failover" {
  name = "failover-notifications"
}

# EventBridge rule for failover events
resource "aws_cloudwatch_event_rule" "failover_event" {
  name        = "failover-event-rule"
  description = "Capture failover events"

  event_pattern = jsonencode({
    source      = ["aws.route53"]
    detail-type = ["AWS API Call via CloudTrail"]
    detail = {
      eventName = ["UpdateHealthCheckStatus"]
    }
  })
}

resource "aws_cloudwatch_event_target" "failover_lambda" {
  rule      = aws_cloudwatch_event_rule.failover_event.name
  target_id = "FailoverLambda"
  arn       = aws_lambda_function.failover_handler.arn
}
# Failover automation handler
import boto3
import json
from typing import Dict, Any

route53 = boto3.client('route53')
cloudfront = boto3.client('cloudfront')

class FailoverAutomation:
    """Automated failover management"""

    def __init__(self):
        self.route53 = boto3.client('route53')
        self.cloudfront = boto3.client('cloudfront')

    def handle_failover(self, event: Dict[str, Any]) -> Dict[str, Any]:
        """Handle failover event"""
        # Parse CloudWatch alarm
        alarm_name = event['detail']['alarmName']
        alarm_state = event['detail']['state']['value']

        if alarm_state == 'ALARM':
            return self._trigger_failover(alarm_name)
        elif alarm_state == 'OK':
            return self._trigger_recovery(alarm_name)

    def _trigger_failover(self, alarm_name: str) -> Dict[str, Any]:
        """Trigger failover to secondary region"""
        # Update Route53 records
        self.route53.change_resource_record_sets(
            HostedZoneId='Z1234567890',
            ChangeBatch={
                'Changes': [
                    {
                        'Action': 'UPSERT',
                        'ResourceRecordSet': {
                            'Name': 'app.example.com',
                            'Type': 'A',
                            'SetIdentifier': 'primary',
                            'Failover': 'SECONDARY',  # Change to secondary
                            'TTL': 60,
                            'ResourceRecords': [
                                {'Value': '203.0.113.10'}  # Secondary region IP
                            ]
                        }
                    }
                ]
            }
        )

        # Update CloudFront origin
        self.cloudfront.update_distribution(
            DistributionId='E1234567890',
            IfMatch='ETAG',
            DistributionConfig={
                'Origins': {
                    'Items': [
                        {
                            'Id': 'secondary-origin',
                            'DomainName': 'secondary.example.com',
                            'CustomOriginConfig': {
                                'HTTPPort': 80,
                                'HTTPSPort': 443,
                                'OriginProtocolPolicy': 'https-only'
                            }
                        }
                    ]
                }
            }
        )

        return {
            'action': 'failover',
            'status': 'completed',
            'new_primary': 'eu-west-1'
        }

    def _trigger_recovery(self, alarm_name: str) -> Dict[str, Any]:
        """Trigger recovery to primary region"""
        # Similar logic to failover, but reverse
        return {
            'action': 'recovery',
            'status': 'completed',
            'new_primary': 'us-east-1'
        }

โœ…Multi-Region Benefits

Active-active deployments provide high availability, low latency globally, and disaster recovery. The key is balancing consistency with availability.

Summary

ComponentPurposeConfiguration
Route53Global DNSLatency/failover routing
DynamoDB Global TablesMulti-region databaseReplication, conflict resolution
CloudFrontGlobal CDNEdge caching, origin failover
Cross-Region ReplicationData syncKinesis, async replication
Failover AutomationRecoveryCloudWatch, Lambda, EventBridge

Advertisement