π¨ MSK Connect
Master MSK Connect with Kafka Connect patterns and connectors.
Module: AWS Data Engineering β’ Topic 43 of 65 β’ Premium Content
MSK Connect Architecture
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MSK CONNECT ARCHITECTURE β
β β
β Source Connector β MSK Topic β Sink Connector β Destination β
β β
β Source Connectors: β
β β’ Debezium (CDC from databases) β
β β’ JDBC Source (poll-based) β
β β’ S3 Source (file ingestion) β
β β’ DynamoDB Streams β
β β
β Sink Connectors: β
β β’ S3 Sink (data lake ingestion) β
β β’ Redshift Sink β
β β’ Elasticsearch Sink β
β β’ HTTP Sink β
β β
β Single Message Transforms (SMTs): β
β β’ RegexRouter (topic routing) β
β β’ ExtractNewRecordState (flatten) β
β β’ InsertField (add metadata) β
β β’ ReplaceField (rename/remove) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Connector Examples
{
"name": "s3-sink-connector",
"config": {
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "3",
"topics.regex": "raw-(.*)",
"s3.bucket.name": "data-lake-raw",
"flush.size": "1000",
"rotate.interval.ms": "300000",
"format.class": "io.confluent.connect.s3.format.parquet.ParquetFormat",
"parquet.codec": "snappy",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "raw-(.*)",
"transforms.route.replacement": "processed-$1"
}
}
Interview Q&A
Q1: MSK Connect vs. self-managed Kafka Connect?
Answer: MSK Connect is managed, auto-scaling, no infrastructure. Self-managed gives more control over worker configuration.
Q2: What are SMTs?
Answer: Single Message Transforms modify records in-flight without a custom converter. Used for routing, field manipulation, and data enrichment.
Q3: How does MSK Connect scale?
Answer: Auto-scaling based on connector throughput. Configure min/max capacity for cost control.
Summary
- MSK Connect: Managed Kafka Connect on AWS
- Source Connectors: Debezium, JDBC, S3, DynamoDB
- Sink Connectors: S3, Redshift, Elasticsearch, HTTP
- SMTs: In-flight record transformation without custom code
- Scaling: Auto-scaling based on connector throughput