AWS Data Migration & Modernization
AWS Data Migration & Modernization
Module 63 β Database Migration Service, Schema Conversion, Snow Family, and Migration Strategies
Migration Overview
βΉοΈ
AWS provides a comprehensive migration portfolio including DMS for database migration, SCT for schema conversion, and Snow Family for offline large-scale data transfers.
Migration Phases
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MIGRATION FRAMEWORK β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β ASSESS ββββΆβ MOBILIZEββββΆβ MIGRATE ββββΆβ OPTIMIZE β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β β β β β
β Inventory Planning Execution Modernization β
β Analysis Business Case Validation Cost Reduction β
β Dependencies Training Cutover Performance β
β Compliance Tooling Testing Architecture β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AWS Database Migration Service (DMS)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AWS DMS ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β SOURCE DMS TARGET β
β ββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β Oracle ββββββΆβ Replication ββββββΆβ Aurora PostgreSQLβ β
β β β β Instance β β β β
β ββββββββββββ β β βββββββββββββββββββ β
β ββββββββββββ β βββββββββββββββ β βββββββββββββββββββ β
β β SQL ββββββΆβ βFull Load β ββββββΆβ S3 + Glue β β
β β Server β β βCDC β β β β β
β ββββββββββββ β βTerrorist β β βββββββββββββββββββ β
β ββββββββββββ β βββββββββββββββ β βββββββββββββββββββ β
β β MySQL ββββββΆβ ββββββΆβ DynamoDB β β
β β β β βββββββββββββββ β β β β
β ββββββββββββ β βMonitoring β β βββββββββββββββββββ β
β β βββββββββββββββ β β
β βββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DMS Replication Types
| Replication Type | Description | Use Case | Downtime |
|---|---|---|---|
| Full Load | One-time bulk copy | Initial migration | None |
| CDC (Change Data Capture) | Ongoing replication | Live migration | Minimal |
| Full Load + CDC | Bulk copy + ongoing | Most common | Minimal |
| Homogeneous | Same engine migration | OracleβOracle | None |
| Heterogeneous | Cross-engine migration | OracleβPostgreSQL | Varies |
DMS Task Configuration
-- DMS Task Configuration (AWS Console / API)
-- Migration Type: full-load-and-cdc
-- Source Endpoint Settings (Oracle)
{
"databaseName": "ORCL",
"port": 1521,
"additionalConnectionAttributes": "addSupplementalLogging=Y",
"oracleServerVersion": "19"
}
-- Target Endpoint Settings (PostgreSQL)
{
"databaseName": "targetdb",
"port": 5432,
"maxFileSize": 1048576,
"executeTimeout": 60
}
-- Table Mappings
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "schema-table",
"object-locator": {
"schema-name": "HR",
"table-name": "EMPLOYEES"
},
"rule-action": "include"
},
{
"rule-type": "transformation",
"rule-id": "2",
"rule-name": "rename-schema",
"rule-action": "rename",
"rule-target": "schema",
"object-locator": {
"schema-name": "HR"
},
"value": "human_resources"
}
]
}
DMS Monitoring
-- CloudWatch Metrics for DMS
-- ReplicationLag: Time behind source
-- CDCLatencySource: CDC lag at source
-- CDCLatencyTarget: CDC lag at target
-- CloudWatch Alarms
aws cloudwatch put-metric-alarm \
--alarm-name "DMS-ReplicationLag-High" \
--metric-name "ReplicationLag" \
--namespace "AWS/DMS" \
--statistic Average \
--period 300 \
--threshold 300 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=ReplicationInstanceIdentifier,Value=dms-instance-01 \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
Schema Conversion Tool (SCT)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SCT MIGRATION WORKFLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β SOURCE DB SCT TARGET DB β
β ββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β Oracle ββββββΆβ Assessment ββββββΆβ Aurora MySQL β β
β β 12c β β Report β β β β
β ββββββββββββ β β βββββββββββββββββββ β
β β β’ Schema Conv β β
β ββββββββββββ β β’ Code Conv β βββββββββββββββββββ β
β β SQL ββββββΆβ β’ Risk Analysis ββββββΆβ Aurora PostgreSQLβ β
β β Server β β β’ Effort Est β β β β
β β 2019 β β β βββββββββββββββββββ β
β ββββββββββββ βββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ASSESSMENT RESULTS β β
β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β 2,450 β β 89% β β 12.5 β β β
β β β Objects β β Automated β β Hours Manualβ β β
β β β Assessed β β Conversion β β Work β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β β
β β Action Items: β β
β β β’ 340 Stored procedures need rewriting β β
β β β’ 89 Views require manual review β β
β β β’ 12 Triggers need PL/pgSQL conversion β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SCT Application Collection Agent (ACC)
| Component | Purpose | Deployment |
|---|---|---|
| SCT Console | Main tool interface | Desktop application |
| ACC | Collects app queries | Installed on app server |
| Assessment Report | Migration analysis | Generated by SCT |
| Conversion Project | Schema conversion | Created in SCT |
AWS Snow Family
Snow Family Comparison
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SNOW FAMILY OVERVIEW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββ βββββββββββββ βββββββββββββ βββββββββββββ β
β β SNOWBALL β βSNOWBALL β βSNOWBALL β βSNOWCONE β β
β β EDGE β βEDGE V2 β βEKS β β β β
β β β β(Compute) β β β β β β
β β 80TB-210TBβ β42TB β βContainer β β8TB-42TB β β
β β Storage β βStorage β βCompute β βStorage β β
β β + Compute β β+ Compute β βKubernetes β β β β
β βββββββββββββ βββββββββββββ βββββββββββββ βββββββββββββ β
β β β β β β
β Data Transfer Data Transfer Edge Computing Remote/IoT β
β (80TB+) + Analytics + Containers Data Collection β
β β
β βββββββββββββ β
β β SNOWMOBILEβ β
β β β β
β β 100 PB β β
β β Exabyte- β β
β β Scale β β
β βββββββββββββ β
β β β
β Extreme Data Migration β
β (Data Center Moves) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Snowball Edge Usage
# Snowball Edge Data Transfer Example
import boto3
def transfer_to_snowball():
"""Transfer data using Snowball Edge"""
s3 = boto3.client('s3')
# Create manifest
manifest = {
"manifest": {
"version": "2016-11-28",
"entries": [
{
"ETag": "d41d8cd98f00b204e9800998ecf8427e",
"Length": 1234567,
"Key": "dataset-2024.parquet"
}
]
}
}
# Data transfer workflow
workflow = {
"steps": [
"1. Create import job in Snowball console",
"2. Snowball device ships to location",
"3. Connect device to local network",
"4. Use AWS CLI to copy data to device",
"5. Ship device back to AWS",
"6. Data loads to S3 automatically"
]
}
return workflow
# AWS CLI commands for Snowball
# aws snowball create-job --job-type IMPORT --resources ...
# aws s3 cp /local/data s3://bucket/ --endpoint-url https://snowball-ip:8443
Migration Patterns
Online vs Offline Migration
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MIGRATION STRATEGY MATRIX β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β DATA SIZE LATENCY TOLERANCE STRATEGY β
β ββββββββββ βββββββββββββββββ ββββββββ β
β β
β < 10 GB Hours AWS DMS (Full Load) β
β β
β 10 GB - 1 TB Minutes DMS Full Load + CDC β
β β
β 1 TB - 10 TB Minutes DMS with compression β
β Parallel threads β
β β
β 10 TB - 1 PB Hours Snowball Edge β
β + DMS CDC β
β β
β > 1 PB Days Snowmobile β
β + DMS CDC β
β β
β Any size Zero downtime DMS CDC (ongoing) β
β + app changeover β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Migration with DMS and S3
# Complete migration pipeline with DMS and S3
import boto3
import json
class MigrationPipeline:
def __init__(self, region='us-east-1'):
self.dms = boto3.client('dms', region_name=region)
self.s3 = boto3.client('s3', region_name=region)
self.glue = boto3.client('glue', region_name=region)
def create_replication_instance(self):
"""Create DMS replication instance"""
response = self.dms.create_replication_instance(
ReplicationInstanceIdentifier='migration-instance-01',
ReplicationInstanceClass='dms.r5.2xlarge',
AllocatedStorage=500,
MultiAZ=True,
EngineVersion='3.5.1',
AutoMinorVersionUpgrade=True,
PubliclyAccessible=False,
VpcSecurityGroupIds=['sg-12345678'],
ReplicationSubnetGroupIdentifier='my-subnet-group'
)
return response['ReplicationInstance']['ReplicationInstanceArn']
def create_source_endpoint(self, database_type, connection_info):
"""Create source endpoint"""
response = self.dms.create_endpoint(
EndpointIdentifier=f'source-{database_type}',
EndpointType='source',
EngineName=database_type,
ServerName=connection_info['host'],
Port=connection_info['port'],
DatabaseName=connection_info['database'],
Username=connection_info['username'],
Password=connection_info['password'],
S3Settings={
'BucketArn': 'arn:aws:s3:::my-bucket',
'BucketFolder': 'source-data'
}
)
return response['Endpoint']['EndpointArn']
def create_target_endpoint(self, target_type, connection_info):
"""Create target endpoint"""
response = self.dms.create_endpoint(
EndpointIdentifier=f'target-{target_type}',
EndpointType='target',
EngineName=target_type,
ServerName=connection_info['host'],
Port=connection_info['port'],
DatabaseName=connection_info['database'],
Username=connection_info['username'],
Password=connection_info['password']
)
return response['Endpoint']['EndpointArn']
def create_migration_task(self, source_arn, target_arn, replication_arn):
"""Create migration task with CDC"""
response = self.dms.create_replication_task(
ReplicationTaskIdentifier='full-load-cdc-task',
SourceEndpointArn=source_arn,
TargetEndpointArn=target_arn,
ReplicationInstanceArn=replication_arn,
MigrationType='full-load-and-cdc',
TableMappings=json.dumps({
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "all-tables",
"object-locator": {
"schema-name": "public",
"table-name": "%"
},
"rule-action": "include"
}
]
})
)
return response['ReplicationTask']['ReplicationTaskArn']
# Usage
pipeline = MigrationPipeline()
replication_arn = pipeline.create_replication_instance()
source_arn = pipeline.create_source_endpoint('oracle', {
'host': 'source-db.example.com',
'port': 1521,
'database': 'ORCL',
'username': 'admin',
'password': 'secret'
})
target_arn = pipeline.create_target_endpoint('aurora-postgresql', {
'host': 'target-cluster.cluster-xxx.us-east-1.rds.amazonaws.com',
'port': 5432,
'database': 'targetdb',
'username': 'admin',
'password': 'secret'
})
task_arn = pipeline.create_migration_task(source_arn, target_arn, replication_arn)
Interview Questions & Answers
Q1: What is the difference between AWS DMS and native database replication tools?
Answer: AWS DMS is a managed service that handles provisioning, patching, and monitoring. Native tools like Oracle GoldenGate or SQL Server replication offer more features but require expertise. DMS supports heterogeneous migrations and integrates with AWS services, while native tools may have better performance for specific engines.
Q2: How do you handle schema changes during ongoing CDC replication?
Answer: DMS can handle some DDL changes automatically. For complex changes, create a DMS task to apply changes manually, pause CDC, apply schema changes to both source and target, then resume. Use DMS schema conversion for automatic handling where possible.
Q3: Explain the Snow Family devices and when to use each.
Answer: Snowcone (8TB) for edge computing and IoT, Snowball Edge Storage (80TB) for data migration, Snowball Edge Compute (42TB) for analytics, Snowball EKS for Kubernetes at edge, Snowmobile (100PB) for exabyte-scale migrations. Choose based on data size, compute needs, and connectivity.
Q4: How do you validate data integrity after migration?
Answer: Use DMS validation feature for row counts and checksums, AWS Glue DataBrew for profiling, custom SQL comparisons, and AWS Schema Conversion Tool assessment reports. Create validation scripts comparing source and target with sampling and full validation options.
Q5: What are the best practices for large-scale database migration?
Answer: 1) Use DMS with full load + CDC, 2) Enable parallel load, 3) Optimize batch size, 4) Use compression, 5) Monitor replication lag, 6) Implement validation, 7) Have rollback plan, 8) Test thoroughly, 9) Use multiple replication instances, 10) Schedule during low-traffic periods.
Q6: How do you minimize downtime during database migration?
Answer: Use CDC for near-zero downtime, maintain dual-write during cutover, implement blue-green deployment pattern, test failover procedures, use DMS continuous replication, and coordinate cutover during maintenance windows.
Q7: Explain DMS task settings for optimal performance.
Answer: Configure parallel load for multiple threads, set batch apply enabled for CDC, use compression for large tables, adjust memory limits, enable logging for troubleshooting, and tune commit rate based on target capacity.
Q8: How do you handle data transformation during migration?
Answer: Use DMS transformation rules for column mapping, renaming, and filtering. For complex transformations, use AWS Glue ETL jobs before or after DMS, or use Lambda functions with DMSδΊδ»Ά.
Q9: What is the Schema Conversion Tool and when to use it?
Answer: SCT converts database schemas from one engine to another, assessing compatibility and generating conversion reports. Use for heterogeneous migrations (OracleβPostgreSQL, SQL ServerβAurora MySQL) to identify incompatible objects and estimate effort.
Q10: How do you migrate from on-premises to AWS with Snow Family?
Answer: 1) Create job in Snowball console, 2) Receive and connect device, 3) Use AWS CLI to copy data, 4) Ship back to AWS, 5) Data loads to S3, 6) Use DMS for ongoing sync if needed, 7) Validate and cut over.
Q11: Describe a complete migration architecture for a 50TB Oracle database.
Answer: Use Snowball Edge for initial bulk transfer (50TB), DMS CDC for ongoing replication, SCT for schema conversion, Aurora PostgreSQL as target. Architecture: Oracle β SCT β Snowball β S3 β DMS β Aurora PostgreSQL, with validation and monitoring throughout.
Q12: How do you handle LOB (Large Object) data during migration?
Answer: DMS handles LOBs automatically with settings for LOB mode (full, limited, none). For very large LOBs, use limited mode with max LOB size, or migrate LOBs separately using S3 and reference them in target.
Q13: What are the cost considerations for AWS DMS?
Answer: Costs include replication instance hours, data transfer, storage, and CloudWatch metrics. Use serverless for variable workloads, reserved instances for steady state, and optimize task duration to reduce costs.
Q14: How do you migrate from SQL Server to Aurora PostgreSQL?
Answer: 1) Use SCT to assess and convert schema, 2) Apply conversion suggestions, 3) Create DMS task for full load + CDC, 4) Test application with Aurora, 5) Validate data, 6) Cutover, 7) Decommission SQL Server.
Q15: Explain DMS endpoints and connection attributes.
Answer: Endpoints define source/target connection details. Connection attributes customize behavior: addSupplementalLogging for Oracle, initStatement for SQL Server, andεη§ settings for performance tuning and compatibility.
Q16: How do you migrate schemas with stored procedures and functions?
Answer: SCT converts stored procedures from source to target language (PL/SQL to PL/pgSQL). Manual review needed for complex logic. Use assessment report to identify issues and create conversion projects for batch processing.
Q17: What is the DMS Serverless feature?
Answer: DMS Serverless automatically scales replication capacity based on workload. No instance provisioning needed, pay per data processed. Ideal for variable workloads and migrations where capacity planning is difficult.
Q18: How do you handle network connectivity for DMS?
Answer: Use VPC for private connectivity, VPN for on-premises, Direct Connect for dedicated links. Configure security groups to allow DMS traffic, use SSL for encryption, and ensure proper DNS resolution.
Q19: Describe best practices for DMS monitoring.
Answer: Monitor CloudWatch metrics (ReplicationLag, CDCLatency), set alarms for high lag, enable DMS logging to CloudWatch, use AWS CloudTrail for API calls, and create dashboards for migration progress.
Q20: How do you migrate from MySQL to Aurora MySQL?
Answer: For homogeneous migration, use DMS with minimal transformation. Enable parallel load, tune batch settings, and use CDC for ongoing sync. Aurora MySQL compatibility ensures smooth migration.
Q21: What is the AWS Migration Acceleration Program (MAP)?
Answer: MAP provides tools, training, and credits for cloud migration. Includes AWS Migration Hub for tracking, DMS and SCT for database migration, Snow Family for data transfer, and consulting credits.
Q22: How do you handle encryption during migration?
Answer: DMS supports SSL/TLS for data in transit, KMS for encryption at rest. Configure endpoints with SSL certificates, use KMS keys for stored data, and ensure compliance with security policies.
Q23: Explain DMS task error handling and logging.
Answer: DMS logs errors to CloudWatch, provides task tables for error details. Configure error handling rules, set max errors limit, enable logging for troubleshooting, and use AWS CloudTrail for API-level monitoring.
Q24: How do you migrate a data warehouse to Redshift?
Answer: Use SCT to convert schema from source warehouse, DMS for data migration, or use Redshift COPY command from S3. For large migrations, use Snowball to stage data in S3 then COPY.
Q25: What are the limitations of AWS DMS?
Answer: Limitations include: some DDL not replicated, certain data types need special handling, LOB limitations, performance depends on instance size, and heterogeneous migrations may need manual work.
Key Takeaways
β οΈ
Always test migrations thoroughly in non-production environments before executing production migrations. Have rollback plans and validate data integrity.
- Choose right migration tool: DMS for databases, Snow Family for offline large-scale
- Plan for schema conversion: Use SCT for heterogeneous migrations
- Minimize downtime: Use CDC for ongoing replication during cutover
- Validate data: Implement comprehensive validation checks
- Monitor and optimize: Use CloudWatch metrics to track progress
- Cost optimization: Use DMS Serverless for variable workloads
- Security: Encrypt data in transit and at rest
- Testing: Thorough testing before production migration