Real-Time Processing Interview Q&A
25 interview questions on real-time streaming and event-driven architectures
Question 1: What is the difference between real-time, near-real-time, and batch?
Answer: Real-time: Sub-second latency (Event Hubs + Functions). Near-real-time: Seconds to minutes (Stream Analytics). Batch: Minutes to hours (ADF). Choose based on latency requirements.
Question 2: How do you handle backpressure in streaming?
Answer: Event Hubs buffering (7-day retention), Stream Analytics SU scaling, consumer rate limiting, and circuit breaker patterns.
Question 3: What is the benefit of Event Hubs Capture?
Answer: Auto-archives to ADLS for batch analytics alongside real-time processing. No additional charges. Enables lambda architecture.
Question 4: How do you implement exactly-once processing?
Answer: Checkpointing, idempotent operations, deduplication with unique keys, and transactional writes to downstream systems.
Question 5: What is the difference between event-driven and request-driven?
Answer: Event-driven: Push-based, reactive (Event Hubs, Functions). Request-driven: Pull-based, polling (ADF). Use event-driven for real-time; request-driven for batch.
Question 6: How do you handle late-arriving data?
Answer: Stream Analytics late arrival policy, watermark delay, out-of-order tolerance, and retroactive window updates.
Question 7: What is the benefit of Databricks Structured Streaming?
Answer: Spark-based streaming with Delta Lake integration, exactly-once processing, and unified batch/streaming code.
Question 8: How do you handle schema changes in streaming?
Answer: Schema registry (Event Hubs), schema evolution in consumers, flexible data models, and schema validation at ingestion.
Question 9: What is the difference between at-least-once and exactly-once?
Answer: At-least-once: Duplicates possible (Event Hubs default). Exactly-once: No duplicates (requires deduplication). Use idempotent processing for exactly-once semantics.
Question 10: How do you implement windowed aggregations?
Answer: Stream Analytics windowing functions (Tumbling, Hopping, Session, Sliding), Databricks Structured Streaming windows, or custom windowing logic.
Question 11: What is the benefit of Event Grid integration?
Answer: Event routing, filtering, and delivery to multiple subscribers. Enables reactive architectures with smart routing.
Question 12: How do you handle message ordering?
Answer: Partition keys in Event Hubs for ordering within partitions. For global ordering, use single partition (limits throughput).
Question 13: What is the difference between pull and push consumption?
Answer: Pull: Consumer polls (higher latency). Push: Events pushed to consumer (lower latency). Event Hubs supports both models.
Question 14: How do you implement dead-letter queues?
Answer: Route failed messages to separate storage, implement retry policies, alert on failures, and analyze for root cause.
Question 15: What is the benefit of Cosmos DB for streaming?
Answer: Single-digit millisecond latency, global distribution, auto-scaling, and change feed for downstream processing.
Question 16: How do you monitor streaming pipeline health?
Answer: Event Hub metrics (throughput, lag), Stream Analytics SU, checkpoint progress, custom metrics, and alerting.
Question 17: What is the difference between stream and micro-batch?
Answer: Stream: Process each event individually. Micro-batch: Process small batches (seconds). Micro-batch provides better throughput with near-real-time latency.
Question 18: How do you handle idempotent processing?
Answer: Unique message IDs, checkpoint processing, deduplication logic, and transactional writes.
Question 19: What is the benefit of Azure Functions for streaming?
Answer: Event-driven, auto-scaling, pay-per-execution, and integration with Event Hubs, Service Bus, and Event Grid.
Question 20: How do you handle poison messages?
Answer: Dead-letter queue, retry with exponential backoff, alert on repeated failures, and manual inspection.
Question 21: What is the difference between Event Hubs and Kafka?
Answer: Event Hubs: Fully managed Azure PaaS with Capture. Kafka: Open-source, self-managed or Confluent Cloud. Event Hubs provides Kafka endpoint for compatibility.
Question 22: How do you implement real-time dashboards?
Answer: Stream Analytics β Power BI DirectQuery, Cosmos DB β Power BI, or custom dashboard with Event Hubs β Functions β SignalR.
Question 23: What is the benefit of real-time ML inference?
Answer: Predictive maintenance, fraud detection, and personalization. Use Azure ML endpoints with Event Hubs for real-time scoring.
Question 24: How do you handle message replay?
Answer: Event Hubs offset reset, Capture files in ADLS, checkpoint restore, and idempotent processing.
Question 25: What is the future of real-time processing?
Answer: Unified batch/streaming, serverless streaming, edge processing, and AI-powered real-time analytics.