π Kinesis Analytics
Master Kinesis Analytics SQL applications and Apache Flink integration.
Module: AWS Data Engineering β’ Topic 42 of 65 β’ Premium Content
Kinesis Analytics Architecture
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KINESIS ANALYTICS ARCHITECTURE β
β β
β Source (Kinesis/Firehose) β Application (SQL/Flink) β Sink (Lambda/S3) β
β β
β SQL Application: β
β β’ Standard SQL for stream processing β
β β’ Windowed aggregations (tumbling, sliding, session) β
β β’ Real-time dashboards β
β β’ Anomaly detection β
β β
β Flink Application: β
β β’ Apache Flink runtime β
β β’ Java/Scala/Python β
β β’ State management β
β β’ Exactly-once processing β
β β’ Complex event processing β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SQL Application Example
-- Create input stream
CREATE OR REPLACE STREAM "sensor_stream" (
sensor_id VARCHAR(16),
temperature DOUBLE,
event_time TIMESTAMP
);
-- Windowed aggregation (5-minute tumbling window)
CREATE OR REPLACE STREAM "aggregated_stream" (
sensor_id VARCHAR(16),
avg_temperature DOUBLE,
max_temperature DOUBLE,
reading_count INTEGER,
window_start TIMESTAMP
);
CREATE OR REPLACE PUMP "aggregate_pump" AS
INSERT INTO "aggregated_stream"
SELECT
sensor_id,
AVG(temperature) AS avg_temperature,
MAX(temperature) AS max_temperature,
COUNT(*) AS reading_count,
STEP("sensor_stream".ROWTIME BY INTERVAL '5' MINUTE) AS window_start
FROM "sensor_stream"
GROUP BY sensor_id, STEP("sensor_stream".ROWTIME BY INTERVAL '5' MINUTE);
Interview Q&A
Q1: SQL vs Flink applications?
Answer: SQL for simple transformations, aggregations. Flink for complex stateful processing, custom logic, and exactly-once semantics.
Q2: What window types are supported?
Answer: Tumbling (fixed), Sliding (overlapping), Session (activity-based). Each serves different aggregation patterns.
Q3: How does auto-scaling work?
Answer: Kinesis Analytics scales capacity units (ACUs) based on incoming data volume. Minimum 1 ACU, up to 128 ACUs.
Summary
- SQL Applications: Standard SQL for stream processing
- Flink: Complex stateful processing with Java/Scala/Python
- Windows: Tumbling, Sliding, Session for aggregations
- Auto-scaling: ACUs scale based on data volume
- Sinks: Lambda, S3, Kinesis Data Streams, Redshift