πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

AWS Data Migration & Modernization

Interview Q&AData Migration & Modernization⭐ Premium

Advertisement

AWS Data Migration & Modernization

AWS Data Migration & Modernization

Module 63 β€” Database Migration Service, Schema Conversion, Snow Family, and Migration Strategies

Migration Overview

ℹ️

AWS provides a comprehensive migration portfolio including DMS for database migration, SCT for schema conversion, and Snow Family for offline large-scale data transfers.

Migration Phases

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    MIGRATION FRAMEWORK                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ ASSESS   │──▢│  MOBILIZE│──▢│ MIGRATE  │──▢│ OPTIMIZE β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚       β”‚              β”‚              β”‚              β”‚           β”‚
β”‚  Inventory     Planning        Execution     Modernization   β”‚
β”‚  Analysis      Business Case   Validation    Cost Reduction  β”‚
β”‚  Dependencies  Training        Cutover       Performance     β”‚
β”‚  Compliance    Tooling         Testing       Architecture    β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

AWS Database Migration Service (DMS)

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      AWS DMS ARCHITECTURE                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  SOURCE                    DMS                      TARGET         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Oracle   │────▢│ Replication     │────▢│ Aurora PostgreSQLβ”‚   β”‚
β”‚  β”‚          β”‚     β”‚ Instance        β”‚      β”‚                 β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚                 β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ SQL      │────▢│ β”‚Full Load    β”‚ │────▢│ S3 + Glue       β”‚   β”‚
β”‚  β”‚ Server   β”‚     β”‚ β”‚CDC          β”‚ β”‚      β”‚                 β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ β”‚Terrorist    β”‚ β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ MySQL    │────▢│                 │────▢│ DynamoDB        β”‚   β”‚
β”‚  β”‚          β”‚     β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚      β”‚                 β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ β”‚Monitoring   β”‚ β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                   β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚                             β”‚
β”‚                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                             β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

DMS Replication Types

Replication TypeDescriptionUse CaseDowntime
Full LoadOne-time bulk copyInitial migrationNone
CDC (Change Data Capture)Ongoing replicationLive migrationMinimal
Full Load + CDCBulk copy + ongoingMost commonMinimal
HomogeneousSame engine migrationOracle→OracleNone
HeterogeneousCross-engine migrationOracle→PostgreSQLVaries

DMS Task Configuration

-- DMS Task Configuration (AWS Console / API)
-- Migration Type: full-load-and-cdc

-- Source Endpoint Settings (Oracle)
{
  "databaseName": "ORCL",
  "port": 1521,
  "additionalConnectionAttributes": "addSupplementalLogging=Y",
  "oracleServerVersion": "19"
}

-- Target Endpoint Settings (PostgreSQL)
{
  "databaseName": "targetdb",
  "port": 5432,
  "maxFileSize": 1048576,
  "executeTimeout": 60
}

-- Table Mappings
{
  "rules": [
    {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "schema-table",
      "object-locator": {
        "schema-name": "HR",
        "table-name": "EMPLOYEES"
      },
      "rule-action": "include"
    },
    {
      "rule-type": "transformation",
      "rule-id": "2",
      "rule-name": "rename-schema",
      "rule-action": "rename",
      "rule-target": "schema",
      "object-locator": {
        "schema-name": "HR"
      },
      "value": "human_resources"
    }
  ]
}

DMS Monitoring

-- CloudWatch Metrics for DMS
-- ReplicationLag: Time behind source
-- CDCLatencySource: CDC lag at source
-- CDCLatencyTarget: CDC lag at target

-- CloudWatch Alarms
aws cloudwatch put-metric-alarm \
  --alarm-name "DMS-ReplicationLag-High" \
  --metric-name "ReplicationLag" \
  --namespace "AWS/DMS" \
  --statistic Average \
  --period 300 \
  --threshold 300 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=ReplicationInstanceIdentifier,Value=dms-instance-01 \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts

Schema Conversion Tool (SCT)

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SCT MIGRATION WORKFLOW                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  SOURCE DB              SCT                    TARGET DB            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Oracle   │────▢│ Assessment      │────▢│ Aurora MySQL     β”‚   β”‚
β”‚  β”‚ 12c      β”‚     β”‚ Report          β”‚      β”‚                 β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚                 β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                   β”‚ β€’ Schema Conv   β”‚                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚ β€’ Code Conv     β”‚      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ SQL      │────▢│ β€’ Risk Analysis │────▢│ Aurora PostgreSQLβ”‚   β”‚
β”‚  β”‚ Server   β”‚     β”‚ β€’ Effort Est    β”‚      β”‚                 β”‚   β”‚
β”‚  β”‚ 2019     β”‚     β”‚                 β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                             β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚ ASSESSMENT RESULTS                                       β”‚     β”‚
β”‚  β”‚                                                          β”‚     β”‚
β”‚  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚     β”‚
β”‚  β”‚ β”‚ 2,450       β”‚ β”‚ 89%         β”‚ β”‚ 12.5        β”‚        β”‚     β”‚
β”‚  β”‚ β”‚ Objects     β”‚ β”‚ Automated   β”‚ β”‚ Hours Manualβ”‚        β”‚     β”‚
β”‚  β”‚ β”‚ Assessed    β”‚ β”‚ Conversion  β”‚ β”‚ Work        β”‚        β”‚     β”‚
β”‚  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚     β”‚
β”‚  β”‚                                                          β”‚     β”‚
β”‚  β”‚ Action Items:                                            β”‚     β”‚
β”‚  β”‚ β€’ 340 Stored procedures need rewriting                   β”‚     β”‚
β”‚  β”‚ β€’ 89 Views require manual review                         β”‚     β”‚
β”‚  β”‚ β€’ 12 Triggers need PL/pgSQL conversion                   β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

SCT Application Collection Agent (ACC)

ComponentPurposeDeployment
SCT ConsoleMain tool interfaceDesktop application
ACCCollects app queriesInstalled on app server
Assessment ReportMigration analysisGenerated by SCT
Conversion ProjectSchema conversionCreated in SCT

AWS Snow Family

Snow Family Comparison

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      SNOW FAMILY OVERVIEW                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚  SNOWBALL β”‚  β”‚SNOWBALL   β”‚  β”‚SNOWBALL   β”‚  β”‚SNOWCONE   β”‚      β”‚
β”‚  β”‚  EDGE     β”‚  β”‚EDGE V2    β”‚  β”‚EKS        β”‚  β”‚           β”‚      β”‚
β”‚  β”‚           β”‚  β”‚(Compute)  β”‚  β”‚           β”‚  β”‚           β”‚      β”‚
β”‚  β”‚ 80TB-210TBβ”‚  β”‚42TB       β”‚  β”‚Container  β”‚  β”‚8TB-42TB   β”‚      β”‚
β”‚  β”‚ Storage   β”‚  β”‚Storage    β”‚  β”‚Compute    β”‚  β”‚Storage    β”‚      β”‚
β”‚  β”‚ + Compute β”‚  β”‚+ Compute  β”‚  β”‚Kubernetes β”‚  β”‚           β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚       β”‚              β”‚              β”‚              β”‚               β”‚
β”‚  Data Transfer  Data Transfer  Edge Computing  Remote/IoT        β”‚
β”‚  (80TB+)        + Analytics    + Containers    Data Collection   β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                    β”‚
β”‚  β”‚ SNOWMOBILEβ”‚                                                    β”‚
β”‚  β”‚           β”‚                                                    β”‚
β”‚  β”‚ 100 PB    β”‚                                                    β”‚
β”‚  β”‚ Exabyte-  β”‚                                                    β”‚
β”‚  β”‚ Scale     β”‚                                                    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                    β”‚
β”‚       β”‚                                                            β”‚
β”‚  Extreme Data Migration                                           β”‚
β”‚  (Data Center Moves)                                              β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Snowball Edge Usage

# Snowball Edge Data Transfer Example
import boto3

def transfer_to_snowball():
    """Transfer data using Snowball Edge"""
    
    s3 = boto3.client('s3')
    
    # Create manifest
    manifest = {
        "manifest": {
            "version": "2016-11-28",
            "entries": [
                {
                    "ETag": "d41d8cd98f00b204e9800998ecf8427e",
                    "Length": 1234567,
                    "Key": "dataset-2024.parquet"
                }
            ]
        }
    }
    
    # Data transfer workflow
    workflow = {
        "steps": [
            "1. Create import job in Snowball console",
            "2. Snowball device ships to location",
            "3. Connect device to local network",
            "4. Use AWS CLI to copy data to device",
            "5. Ship device back to AWS",
            "6. Data loads to S3 automatically"
        ]
    }
    
    return workflow

# AWS CLI commands for Snowball
# aws snowball create-job --job-type IMPORT --resources ...
# aws s3 cp /local/data s3://bucket/ --endpoint-url https://snowball-ip:8443

Migration Patterns

Online vs Offline Migration

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    MIGRATION STRATEGY MATRIX                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  DATA SIZE          LATENCY TOLERANCE    STRATEGY                   β”‚
β”‚  ──────────         ─────────────────    ────────                   β”‚
β”‚                                                                     β”‚
β”‚  < 10 GB            Hours               AWS DMS (Full Load)        β”‚
β”‚                                                                     β”‚
β”‚  10 GB - 1 TB       Minutes             DMS Full Load + CDC        β”‚
β”‚                                                                     β”‚
β”‚  1 TB - 10 TB       Minutes             DMS with compression       β”‚
β”‚                                         Parallel threads           β”‚
β”‚                                                                     β”‚
β”‚  10 TB - 1 PB       Hours               Snowball Edge              β”‚
β”‚                                         + DMS CDC                  β”‚
β”‚                                                                     β”‚
β”‚  > 1 PB             Days                Snowmobile                 β”‚
β”‚                                         + DMS CDC                  β”‚
β”‚                                                                     β”‚
β”‚  Any size           Zero downtime       DMS CDC (ongoing)          β”‚
β”‚                                         + app changeover           β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Migration with DMS and S3

# Complete migration pipeline with DMS and S3
import boto3
import json

class MigrationPipeline:
    def __init__(self, region='us-east-1'):
        self.dms = boto3.client('dms', region_name=region)
        self.s3 = boto3.client('s3', region_name=region)
        self.glue = boto3.client('glue', region_name=region)
    
    def create_replication_instance(self):
        """Create DMS replication instance"""
        response = self.dms.create_replication_instance(
            ReplicationInstanceIdentifier='migration-instance-01',
            ReplicationInstanceClass='dms.r5.2xlarge',
            AllocatedStorage=500,
            MultiAZ=True,
            EngineVersion='3.5.1',
            AutoMinorVersionUpgrade=True,
            PubliclyAccessible=False,
            VpcSecurityGroupIds=['sg-12345678'],
            ReplicationSubnetGroupIdentifier='my-subnet-group'
        )
        return response['ReplicationInstance']['ReplicationInstanceArn']
    
    def create_source_endpoint(self, database_type, connection_info):
        """Create source endpoint"""
        response = self.dms.create_endpoint(
            EndpointIdentifier=f'source-{database_type}',
            EndpointType='source',
            EngineName=database_type,
            ServerName=connection_info['host'],
            Port=connection_info['port'],
            DatabaseName=connection_info['database'],
            Username=connection_info['username'],
            Password=connection_info['password'],
            S3Settings={
                'BucketArn': 'arn:aws:s3:::my-bucket',
                'BucketFolder': 'source-data'
            }
        )
        return response['Endpoint']['EndpointArn']
    
    def create_target_endpoint(self, target_type, connection_info):
        """Create target endpoint"""
        response = self.dms.create_endpoint(
            EndpointIdentifier=f'target-{target_type}',
            EndpointType='target',
            EngineName=target_type,
            ServerName=connection_info['host'],
            Port=connection_info['port'],
            DatabaseName=connection_info['database'],
            Username=connection_info['username'],
            Password=connection_info['password']
        )
        return response['Endpoint']['EndpointArn']
    
    def create_migration_task(self, source_arn, target_arn, replication_arn):
        """Create migration task with CDC"""
        response = self.dms.create_replication_task(
            ReplicationTaskIdentifier='full-load-cdc-task',
            SourceEndpointArn=source_arn,
            TargetEndpointArn=target_arn,
            ReplicationInstanceArn=replication_arn,
            MigrationType='full-load-and-cdc',
            TableMappings=json.dumps({
                "rules": [
                    {
                        "rule-type": "selection",
                        "rule-id": "1",
                        "rule-name": "all-tables",
                        "object-locator": {
                            "schema-name": "public",
                            "table-name": "%"
                        },
                        "rule-action": "include"
                    }
                ]
            })
        )
        return response['ReplicationTask']['ReplicationTaskArn']

# Usage
pipeline = MigrationPipeline()
replication_arn = pipeline.create_replication_instance()
source_arn = pipeline.create_source_endpoint('oracle', {
    'host': 'source-db.example.com',
    'port': 1521,
    'database': 'ORCL',
    'username': 'admin',
    'password': 'secret'
})
target_arn = pipeline.create_target_endpoint('aurora-postgresql', {
    'host': 'target-cluster.cluster-xxx.us-east-1.rds.amazonaws.com',
    'port': 5432,
    'database': 'targetdb',
    'username': 'admin',
    'password': 'secret'
})
task_arn = pipeline.create_migration_task(source_arn, target_arn, replication_arn)

Interview Questions & Answers

Q1: What is the difference between AWS DMS and native database replication tools?

Answer: AWS DMS is a managed service that handles provisioning, patching, and monitoring. Native tools like Oracle GoldenGate or SQL Server replication offer more features but require expertise. DMS supports heterogeneous migrations and integrates with AWS services, while native tools may have better performance for specific engines.

Q2: How do you handle schema changes during ongoing CDC replication?

Answer: DMS can handle some DDL changes automatically. For complex changes, create a DMS task to apply changes manually, pause CDC, apply schema changes to both source and target, then resume. Use DMS schema conversion for automatic handling where possible.

Q3: Explain the Snow Family devices and when to use each.

Answer: Snowcone (8TB) for edge computing and IoT, Snowball Edge Storage (80TB) for data migration, Snowball Edge Compute (42TB) for analytics, Snowball EKS for Kubernetes at edge, Snowmobile (100PB) for exabyte-scale migrations. Choose based on data size, compute needs, and connectivity.

Q4: How do you validate data integrity after migration?

Answer: Use DMS validation feature for row counts and checksums, AWS Glue DataBrew for profiling, custom SQL comparisons, and AWS Schema Conversion Tool assessment reports. Create validation scripts comparing source and target with sampling and full validation options.

Q5: What are the best practices for large-scale database migration?

Answer: 1) Use DMS with full load + CDC, 2) Enable parallel load, 3) Optimize batch size, 4) Use compression, 5) Monitor replication lag, 6) Implement validation, 7) Have rollback plan, 8) Test thoroughly, 9) Use multiple replication instances, 10) Schedule during low-traffic periods.

Q6: How do you minimize downtime during database migration?

Answer: Use CDC for near-zero downtime, maintain dual-write during cutover, implement blue-green deployment pattern, test failover procedures, use DMS continuous replication, and coordinate cutover during maintenance windows.

Q7: Explain DMS task settings for optimal performance.

Answer: Configure parallel load for multiple threads, set batch apply enabled for CDC, use compression for large tables, adjust memory limits, enable logging for troubleshooting, and tune commit rate based on target capacity.

Q8: How do you handle data transformation during migration?

Answer: Use DMS transformation rules for column mapping, renaming, and filtering. For complex transformations, use AWS Glue ETL jobs before or after DMS, or use Lambda functions with DMSδΊ‹δ»Ά.

Q9: What is the Schema Conversion Tool and when to use it?

Answer: SCT converts database schemas from one engine to another, assessing compatibility and generating conversion reports. Use for heterogeneous migrations (Oracle→PostgreSQL, SQL Server→Aurora MySQL) to identify incompatible objects and estimate effort.

Q10: How do you migrate from on-premises to AWS with Snow Family?

Answer: 1) Create job in Snowball console, 2) Receive and connect device, 3) Use AWS CLI to copy data, 4) Ship back to AWS, 5) Data loads to S3, 6) Use DMS for ongoing sync if needed, 7) Validate and cut over.

Q11: Describe a complete migration architecture for a 50TB Oracle database.

Answer: Use Snowball Edge for initial bulk transfer (50TB), DMS CDC for ongoing replication, SCT for schema conversion, Aurora PostgreSQL as target. Architecture: Oracle β†’ SCT β†’ Snowball β†’ S3 β†’ DMS β†’ Aurora PostgreSQL, with validation and monitoring throughout.

Q12: How do you handle LOB (Large Object) data during migration?

Answer: DMS handles LOBs automatically with settings for LOB mode (full, limited, none). For very large LOBs, use limited mode with max LOB size, or migrate LOBs separately using S3 and reference them in target.

Q13: What are the cost considerations for AWS DMS?

Answer: Costs include replication instance hours, data transfer, storage, and CloudWatch metrics. Use serverless for variable workloads, reserved instances for steady state, and optimize task duration to reduce costs.

Q14: How do you migrate from SQL Server to Aurora PostgreSQL?

Answer: 1) Use SCT to assess and convert schema, 2) Apply conversion suggestions, 3) Create DMS task for full load + CDC, 4) Test application with Aurora, 5) Validate data, 6) Cutover, 7) Decommission SQL Server.

Q15: Explain DMS endpoints and connection attributes.

Answer: Endpoints define source/target connection details. Connection attributes customize behavior: addSupplementalLogging for Oracle, initStatement for SQL Server, and各种 settings for performance tuning and compatibility.

Q16: How do you migrate schemas with stored procedures and functions?

Answer: SCT converts stored procedures from source to target language (PL/SQL to PL/pgSQL). Manual review needed for complex logic. Use assessment report to identify issues and create conversion projects for batch processing.

Q17: What is the DMS Serverless feature?

Answer: DMS Serverless automatically scales replication capacity based on workload. No instance provisioning needed, pay per data processed. Ideal for variable workloads and migrations where capacity planning is difficult.

Q18: How do you handle network connectivity for DMS?

Answer: Use VPC for private connectivity, VPN for on-premises, Direct Connect for dedicated links. Configure security groups to allow DMS traffic, use SSL for encryption, and ensure proper DNS resolution.

Q19: Describe best practices for DMS monitoring.

Answer: Monitor CloudWatch metrics (ReplicationLag, CDCLatency), set alarms for high lag, enable DMS logging to CloudWatch, use AWS CloudTrail for API calls, and create dashboards for migration progress.

Q20: How do you migrate from MySQL to Aurora MySQL?

Answer: For homogeneous migration, use DMS with minimal transformation. Enable parallel load, tune batch settings, and use CDC for ongoing sync. Aurora MySQL compatibility ensures smooth migration.

Q21: What is the AWS Migration Acceleration Program (MAP)?

Answer: MAP provides tools, training, and credits for cloud migration. Includes AWS Migration Hub for tracking, DMS and SCT for database migration, Snow Family for data transfer, and consulting credits.

Q22: How do you handle encryption during migration?

Answer: DMS supports SSL/TLS for data in transit, KMS for encryption at rest. Configure endpoints with SSL certificates, use KMS keys for stored data, and ensure compliance with security policies.

Q23: Explain DMS task error handling and logging.

Answer: DMS logs errors to CloudWatch, provides task tables for error details. Configure error handling rules, set max errors limit, enable logging for troubleshooting, and use AWS CloudTrail for API-level monitoring.

Q24: How do you migrate a data warehouse to Redshift?

Answer: Use SCT to convert schema from source warehouse, DMS for data migration, or use Redshift COPY command from S3. For large migrations, use Snowball to stage data in S3 then COPY.

Q25: What are the limitations of AWS DMS?

Answer: Limitations include: some DDL not replicated, certain data types need special handling, LOB limitations, performance depends on instance size, and heterogeneous migrations may need manual work.

Key Takeaways

⚠️

Always test migrations thoroughly in non-production environments before executing production migrations. Have rollback plans and validate data integrity.

  1. Choose right migration tool: DMS for databases, Snow Family for offline large-scale
  2. Plan for schema conversion: Use SCT for heterogeneous migrations
  3. Minimize downtime: Use CDC for ongoing replication during cutover
  4. Validate data: Implement comprehensive validation checks
  5. Monitor and optimize: Use CloudWatch metrics to track progress
  6. Cost optimization: Use DMS Serverless for variable workloads
  7. Security: Encrypt data in transit and at rest
  8. Testing: Thorough testing before production migration

Advertisement