Azure Stream Analytics: Windowing, Temporal Joins & Inputs
Real-time stream processing with SQL-like queries, windowing functions, and anomaly detection
Stream Analytics Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STREAM ANALYTICS ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β INPUTS QUERY OUTPUTS β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Event Hubs ββββββ>β βββββ>β Cosmos DB β β
β β (Streaming) β β β β (Real-time) β β
β ββββββββββββββββ β β ββββββββββββββββ β
β β β β
β ββββββββββββββββ β Stream β ββββββββββββββββ β
β β IoT Hub ββββββ>β Analytics βββββ>β ADLS Gen2 β β
β β (Device Telemetry) β Job β β (Storage) β β
β ββββββββββββββββ β β ββββββββββββββββ β
β β β β
β ββββββββββββββββ β β ββββββββββββββββ β
β β Blob Storage ββββββ>β βββββ>β Power BI β β
β β (Reference) β β β β (Dashboard) β β
β ββββββββββββββββ β β ββββββββββββββββ β
β β β β
β ββββββββββββββββ β β ββββββββββββββββ β
β β SQL Database ββββββ>β βββββ>β Function App β β
β β (Lookup) β β β β (Actions) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β
β SCALING UNITS: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Streaming Units (SU): 1 SU = 1 MB/s ingress β β
β β β’ SU ranges: 1, 3, 6, 12, 24, 48, 96, 120 (varies) β β
β β β’ Min 3 SU for partitioned queries β β
β β β’ Max 120 SU per job β β
β β β’ 6 SU = 1 vCPU, 2 GB RAM β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Windowing Functions
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WINDOWING FUNCTIONS β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β TUMBLING WINDOW β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Events: A B C D E F G H I J β β
β β β β
β β [A B C] [D E F] [G H I] [J ...] β β
β β Window1 Window2 Window3 Window4 β β
β β β β
β β β’ Fixed size, no overlap β β
β β β’ Each event in exactly one window β β
β β β’ Example: Count events every 5 minutes β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β HOPPING WINDOW β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Events: A B C D E F G H I J β β
β β β β
β β [A B C D E] β β
β β [C D E F G] β β
β β [E F G H I] β β
β β [G H I J ...] β β
β β β β
β β β’ Fixed size with overlap β β
β β β’ Each event in multiple windows β β
β β β’ Example: Rolling 10-min average, hop every 5 min β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β SLIDING WINDOW β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Events: A B C D E F G H I J β β
β β β β
β β [A] [AB] [ABC] [ABCD] [ABCDE] ... β β
β β β β
β β β’ Window defined by time range β β
β β β’ Updates with every new event β β
β β β’ Example: Distinct count over last 5 minutes β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β SESSION WINDOW β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Events: A B C D E F G H I J β β
β β β β
β β [A B C D E] [F G H I] [J ...] β β
β β (gap > 5min) (gap > 5min) β β
β β β β
β β β’ Dynamic size based on event activity β β
β β β’ Closes when gap exceeds timeout β β
β β β’ Example: User session analysis β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Stream Analytics Query Examples
-- Tumbling Window: Count events every 5 minutes
SELECT
System.Timestamp() AS window_end,
COUNT(*) AS event_count,
AVG(temperature) AS avg_temperature,
MAX(temperature) AS max_temperature,
MIN(temperature) AS min_temperature
INTO [cosmos-output]
FROM [eventhub-input] PARTITION BY device_id
GROUP BY
TumblingWindow(minute, 5),
device_id;
-- Hopping Window: Rolling 10-minute average
SELECT
System.Timestamp() AS window_end,
device_id,
AVG(temperature) AS rolling_avg_temp
INTO [powerbi-output]
FROM [eventhub-input]
GROUP BY
HoppingWindow(minute, 10, 5), -- 10-min window, 5-min hop
device_id;
-- Sliding Window: Detect anomalies
SELECT
System.Timestamp() AS window_end,
device_id,
AVG(temperature) AS avg_temp,
STDEV(temperature) AS std_temp,
CASE
WHEN AVG(temperature) > (SELECT AVG(temperature) + 3 * STDEV(temperature)
FROM [eventhub-input]
WHERE device_id = Input.device_id
TIMESTAMP BY event_time
AND DATEDIFF(minute, System.Timestamp(), event_time) BETWEEN -60 AND 0)
THEN 'ANOMALY'
ELSE 'NORMAL'
END AS status
INTO [alert-output]
FROM [eventhub-input] PARTITION BY device_id
GROUP BY
TumblingWindow(minute, 1),
device_id;
-- Temporal Join: Enrich with reference data
SELECT
e.device_id,
e.temperature,
e.event_time,
r.device_location,
r.device_type,
r.threshold
INTO [enriched-output]
FROM [eventhub-input] e
JOIN [reference-input] r
ON e.device_id = r.device_id
AND DATEDIFF(minute, e, r) BETWEEN -5 AND 5;
-- Session Window: User activity
SELECT
device_id,
MIN(event_time) AS session_start,
MAX(event_time) AS session_end,
COUNT(*) AS events_in_session
INTO [session-output]
FROM [eventhub-input]
GROUP BY
SessionWindow(minute, 10), -- 10-min timeout
device_id;
Anomaly Detection
-- Built-in anomaly detection
SELECT
device_id,
System.Timestamp() AS window_end,
AVG(temperature) AS avg_temp,
AnomalyDetection_SpikeAndDip(
temperature, 95, 100,
DATEDIFF(minute, System.Timestamp(), event_time)
) AS spike_result
INTO [anomaly-output]
FROM [eventhub-input]
GROUP BY
TumblingWindow(minute, 5),
device_id;
-- CNTK-based anomaly detection
SELECT
device_id,
System.Timestamp() AS window_end,
AnomalyDetection_ChangePoint(
temperature, 95, 200,
DATEDIFF(minute, System.Timestamp(), event_time)
) AS change_point_result
INTO [changepoint-output]
FROM [eventhub-input]
GROUP BY
TumblingWindow(minute, 10),
device_id;
βΉοΈ
Pro Tip: Use PARTITION BY to parallelize queries across device/entity partitions. Each partition processes independently, improving throughput and reducing latency.
Interview Questions
Q1: Explain the difference between Tumbling and Hopping windows. A: Tumbling windows are fixed-size, non-overlapping (each event in one window). Hopping windows are fixed-size with overlap (each event in multiple windows). Use tumbling for periodic aggregations; hopping for rolling calculations.
Q2: How do you handle late-arriving data in Stream Analytics? A: Use the Late Arrival Policy setting to define how long to accept late events (up to 5 days). Late events update window results retroactively. Use Watermark Delay to control when windows close.
Q3: When would you use Stream Analytics vs Azure Functions for real-time processing? A: Stream Analytics for SQL-like windowed aggregations, temporal joins, and complex event processing. Azure Functions for custom logic, API calls, and simple transformations. Stream Analytics is better for continuous analytics; Functions for event-driven actions.