πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Troubleshooting Interview Q&A

Azure Data EngineeringTroubleshooting⭐ Premium

Advertisement

Troubleshooting Interview Q&A

25 interview questions on troubleshooting Azure data engineering issues

Question 1: ADF pipeline is failing with timeout errors. How do you troubleshoot?

Answer: 1) Check activity timeout settings, 2) Increase timeout, 3) Optimize query/source, 4) Check IR connectivity, 5) Monitor resource utilization, 6) Implement retry policies.

Question 2: Synapse query is running slowly. How do you diagnose?

Answer: 1) Check query plan (EXPLAIN), 2) Verify statistics are up to date, 3) Check distribution skew, 4) Monitor DWU utilization, 5) Review partition pruning.

Question 3: Databricks job is failing with OOM errors. How do you fix?

Answer: 1) Increase cluster memory, 2) Optimize partition count, 3) Use broadcast joins for small tables, 4) Enable AQE, 5) Cache DataFrames.

Question 4: Event Hubs is throttling. How do you resolve?

Answer: 1) Check TU/CU utilization, 2) Scale up TU/CU, 3) Add partitions, 4) Optimize partition key distribution, 5) Scale consumers.

Question 5: ADLS Gen2 is experiencing high latency. How do you troubleshoot?

Answer: 1) Check file sizes (avoid small files), 2) Verify HNS is enabled, 3) Check network connectivity, 4) Monitor IOPS limits, 5) Review access patterns.

Question 6: Power BI dataset refresh is failing. How do you diagnose?

Answer: 1) Check gateway connectivity, 2) Verify credentials, 3) Review data source availability, 4) Check query performance, 5) Monitor refresh history.

Question 7: Cosmos DB is throttling (429 errors). How do you fix?

Answer: 1) Increase RU/s, 2) Optimize partition key, 3) Use point reads over queries, 4) Implement backoff/retry, 5) Check hot partitions.

Question 8: Stream Analytics job is lagging. How do you resolve?

Answer: 1) Check SU utilization, 2) Scale up SU, 3) Optimize query, 4) Check Event Hub throughput, 5) Review watermark progress.

Question 9: Data pipeline is processing duplicate records. How do you fix?

Answer: 1) Implement idempotent processing, 2) Use unique keys for deduplication, 3) Check checkpoint/offset management, 4) Use Delta Lake merge.

Question 10: Synapse External Table is not returning data. How do you troubleshoot?

Answer: 1) Verify file path, 2) Check file format compatibility, 3) Verify credentials/permissions, 4) Test with OPENROWSET, 5) Check partition structure.

Question 11: ADF Data Flow is running slowly. How do you optimize?

Answer: 1) Right-size cluster, 2) Enable auto-scaling, 3) Optimize transformations, 4) Check partition strategy, 5) Use debug mode for analysis.

Question 12: Key Vault is returning 403 Forbidden. How do you fix?

Answer: 1) Check Managed Identity permissions, 2) Verify RBAC roles, 3) Check network rules, 4) Review access policies, 5) Verify vault exists.

Question 13: Purview scan is failing. How do you troubleshoot?

Answer: 1) Check scan credentials, 2) Verify data source connectivity, 3) Review scan ruleset, 4) Check Purview account permissions, 5) Review error logs.

Question 14: Databricks notebook is failing with import errors. How do you fix?

Answer: 1) Check library installation, 2) Verify cluster libraries, 3) Check notebook path, 4) Restart cluster, 5) Check DBR version compatibility.

Question 15: Data pipeline is experiencing data skew. How do you resolve?

Answer: 1) Check distribution strategy, 2) Use salting for skewed keys, 3) Enable AQE skew join optimization, 4) Repartition data.

Question 16: Synapse result set caching is not working. How do you troubleshoot?

Answer: 1) Verify SET RESULT_SET_CACHING ON, 2) Check query compatibility, 3) Monitor cache hit rates, 4) Review cache size limits.

Question 17: Event Hubs Capture files are not being created. How do you fix?

Answer: 1) Verify Capture is enabled, 2) Check storage account permissions, 3) Review file naming format, 4) Check time/size window settings.

Question 18: ADF Linked Service is failing to connect. How do you troubleshoot?

Answer: 1) Verify connection string, 2) Check Managed Identity, 3) Test connectivity, 4) Review firewall rules, 5) Check Key Vault references.

Question 19: Synapse dedicated pool is auto-pausing unexpectedly. How do you fix?

Answer: 1) Check auto-pause settings, 2) Review query schedule, 3) Configure keep-alive, 4) Adjust pause delay.

Question 20: Data quality checks are failing. How do you resolve?

Answer: 1) Review validation rules, 2) Check data source quality, 3) Update rules for edge cases, 4) Implement quarantine pattern.

Question 21: Azure Function is timing out. How do you fix?

Answer: 1) Increase timeout setting, 2) Optimize code, 3) Use Durable Functions for long-running, 4) Check consumption plan limits.

Question 22: Power BI DirectQuery is slow. How do you optimize?

Answer: 1) Optimize source query, 2) Use aggregation tables, 3) Implement row-level security efficiently, 4) Check network latency.

Question 23: Data pipeline is experiencing schema drift. How do you handle?

Answer: 1) Enable schema drift in ADF, 2) Use Delta Lake mergeSchema, 3) Implement schema validation, 4) Use Purview for schema discovery.

Question 24: Synapse Spark pool is failing to start. How do you troubleshoot?

Answer: 1) Check quota limits, 2) Verify VNet configuration, 3) Review Spark version compatibility, 4) Check resource availability.

Question 25: How do you implement effective monitoring for troubleshooting?

Answer: Enable diagnostic settings, create KQL queries, build Workbooks, set up alerts, and implement custom logging in pipelines.

Advertisement