π Athena Federated Query
Master Athena Federated Query with Lambda connectors for cross-source queries.
Module: AWS Data Engineering β’ Topic 44 of 65 β’ Premium Content
Federated Query Architecture
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ATHENA FEDERATED ARCHITECTURE β
β β
β Athena Query β Lambda Connector β External Data Source β
β β
β Supported Connectors: β
β β’ MySQL, PostgreSQL, SQL Server β
β β’ DynamoDB β
β β’ Redshift β
β β’ ElastiCache (Redis) β
β β’ CloudWatch Logs β
β β’ DocumentDB β
β β
β Query Example: β
β SELECT * FROM mysql_catalog.db.table β
β JOIN athena_db.local_table ON ... β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Interview Q&A
Q1: What is federated query?
Answer: Query external data sources directly from Athena without loading data. Uses Lambda connectors to bridge data sources.
Q2: What are the limitations?
Answer: Query timeout (30 min), Lambda memory limits, no DML operations, performance depends on source.
Q3: How do you optimize federated queries?
Answer: Push predicates to source, use partitioning, limit data returned, cache frequent queries.
Summary
- Architecture: Athena β Lambda Connector β External Source
- Connectors: MySQL, PostgreSQL, DynamoDB, Redshift, and more
- Benefits: Query without loading, cross-source joins
- Limitations: 30-min timeout, SELECT only, Lambda limits
- Optimization: Push predicates, limit data returned