Real-Time Processing Interview Q&A

25 interview questions on real-time streaming and event-driven architectures

Question 1: What is the difference between real-time, near-real-time, and batch?

Answer: Real-time: Sub-second latency (Event Hubs + Functions). Near-real-time: Seconds to minutes (Stream Analytics). Batch: Minutes to hours (ADF). Choose based on latency requirements.

Question 2: How do you handle backpressure in streaming?

Answer: Event Hubs buffering (7-day retention), Stream Analytics SU scaling, consumer rate limiting, and circuit breaker patterns.

Question 3: What is the benefit of Event Hubs Capture?

Answer: Auto-archives to ADLS for batch analytics alongside real-time processing. No additional charges. Enables lambda architecture.

Question 4: How do you implement exactly-once processing?

Answer: Checkpointing, idempotent operations, deduplication with unique keys, and transactional writes to downstream systems.

Question 5: What is the difference between event-driven and request-driven?

Answer: Event-driven: Push-based, reactive (Event Hubs, Functions). Request-driven: Pull-based, polling (ADF). Use event-driven for real-time; request-driven for batch.

Question 6: How do you handle late-arriving data?

Answer: Stream Analytics late arrival policy, watermark delay, out-of-order tolerance, and retroactive window updates.

Question 7: What is the benefit of Databricks Structured Streaming?

Answer: Spark-based streaming with Delta Lake integration, exactly-once processing, and unified batch/streaming code.

Question 8: How do you handle schema changes in streaming?

Answer: Schema registry (Event Hubs), schema evolution in consumers, flexible data models, and schema validation at ingestion.

Question 9: What is the difference between at-least-once and exactly-once?

Answer: At-least-once: Duplicates possible (Event Hubs default). Exactly-once: No duplicates (requires deduplication). Use idempotent processing for exactly-once semantics.

Question 10: How do you implement windowed aggregations?

Answer: Stream Analytics windowing functions (Tumbling, Hopping, Session, Sliding), Databricks Structured Streaming windows, or custom windowing logic.

Question 11: What is the benefit of Event Grid integration?

Answer: Event routing, filtering, and delivery to multiple subscribers. Enables reactive architectures with smart routing.

Question 12: How do you handle message ordering?

Answer: Partition keys in Event Hubs for ordering within partitions. For global ordering, use single partition (limits throughput).

Question 13: What is the difference between pull and push consumption?

Answer: Pull: Consumer polls (higher latency). Push: Events pushed to consumer (lower latency). Event Hubs supports both models.

Question 14: How do you implement dead-letter queues?

Answer: Route failed messages to separate storage, implement retry policies, alert on failures, and analyze for root cause.

Question 15: What is the benefit of Cosmos DB for streaming?

Answer: Single-digit millisecond latency, global distribution, auto-scaling, and change feed for downstream processing.

Question 16: How do you monitor streaming pipeline health?

Answer: Event Hub metrics (throughput, lag), Stream Analytics SU, checkpoint progress, custom metrics, and alerting.

Question 17: What is the difference between stream and micro-batch?

Answer: Stream: Process each event individually. Micro-batch: Process small batches (seconds). Micro-batch provides better throughput with near-real-time latency.

Question 18: How do you handle idempotent processing?

Answer: Unique message IDs, checkpoint processing, deduplication logic, and transactional writes.

Question 19: What is the benefit of Azure Functions for streaming?

Answer: Event-driven, auto-scaling, pay-per-execution, and integration with Event Hubs, Service Bus, and Event Grid.

Question 20: How do you handle poison messages?

Answer: Dead-letter queue, retry with exponential backoff, alert on repeated failures, and manual inspection.

Question 21: What is the difference between Event Hubs and Kafka?

Answer: Event Hubs: Fully managed Azure PaaS with Capture. Kafka: Open-source, self-managed or Confluent Cloud. Event Hubs provides Kafka endpoint for compatibility.

Question 22: How do you implement real-time dashboards?

Answer: Stream Analytics → Power BI DirectQuery, Cosmos DB → Power BI, or custom dashboard with Event Hubs → Functions → SignalR.

Question 23: What is the benefit of real-time ML inference?

Answer: Predictive maintenance, fraud detection, and personalization. Use Azure ML endpoints with Event Hubs for real-time scoring.

Question 24: How do you handle message replay?

Answer: Event Hubs offset reset, Capture files in ADLS, checkpoint restore, and idempotent processing.

Question 25: What is the future of real-time processing?

Answer: Unified batch/streaming, serverless streaming, edge processing, and AI-powered real-time analytics.