Synapse Analytics Interview Q&A

25 interview questions on Synapse Analytics pools, performance, and data warehousing

Question 1: What is the difference between Dedicated and Serverless SQL Pools?

Answer: Dedicated: Reserved compute, DWU-based, predictable performance. Serverless: Pay-per-TB-scanned, auto-scale, for ad-hoc exploration. Use dedicated for production; serverless for exploration.

Question 2: How do you choose a distribution strategy?

Answer: Hash: Large fact tables (join columns). Replicated: Small dimension tables (<2GB). Round Robin: Staging tables. Choose based on table size, join patterns, and query performance.

Question 3: What is the benefit of CTAS over INSERT INTO?

Answer: CTAS creates a new table with optimal distribution and indexing. INSERT INTO appends to existing tables. Use CTAS for initial loads; INSERT INTO for incremental updates.

Question 4: How do you optimize query performance?

Answer: 1) Correct distribution, 2) Clustered Columnstore Indexes, 3) Statistics, 4) Partitioning, 5) Result set caching, 6) Materialized views, 7) SET statements.

Question 5: What is the maximum DWU for Dedicated SQL Pool?

Answer: DW6000c (12 compute nodes, 12TB storage). Scale based on workload requirements. Use auto-pause for non-24/7 workloads.

Question 6: How do you handle data skew?

Answer: Check with DBCC PDW_SHOWSPACEUSED. Change distribution key. Use Round Robin for staging. Consider replicated tables for small dimensions.

Question 7: What is the benefit of Clustered Columnstore Indexes?

Answer: Optimal compression (10x), columnar storage, batch-mode execution. Default and recommended for fact tables.

Question 8: How do you implement SCD in Synapse?

Answer: SCD Type 1: MERGE UPDATE. SCD Type 2: MERGE INSERT with effective dates. Use Synapse Serverless for staging changes.

Question 9: What is the difference between external tables and views?

Answer: External tables: Point to files in ADLS with schema. Views: Virtual tables defined by SQL queries. Use external tables for raw data; views for curated data.

Question 10: How do you monitor Synapse performance?

Answer: sys.dm_pdw_exec_requests for query history, DBCC PDW_SHOWSPACEUSED for distribution, Azure Monitor for metrics, and custom KQL queries.

Question 11: What is the benefit of result set caching?

Answer: Stores query results in memory for 5 minutes. Subsequent identical queries return cached results instantly. Reduces compute usage.

Question 12: How do you handle schema evolution?

Answer: Use CTAS to recreate tables with new schema, ALTER TABLE for additive changes, or Synapse Serverless for schema-on-read exploration.

Question 13: What is the difference between CTAS and CREATE TABLE?

Answer: CTAS: Creates table from query results with distribution/indexing. CREATE TABLE: Creates empty table with schema. Use CTAS for data loading.

Question 14: How do you implement incremental loading?

Answer: Use watermarks, MERGE statement for upserts, partition switching for large deletes, and Synapse Pipelines for orchestration.

Question 15: What is the benefit of Synapse Link?

Answer: HTAP capabilities, real-time analytics on operational data (Cosmos DB), no ETL pipeline required, consistent view of transactional data.

Question 16: How do you optimize Synapse costs?

Answer: Auto-pause for non-24/7 workloads, reserved capacity for stable workloads, right-size DWU, and use serverless for ad-hoc queries.

Question 17: What is the difference between Synapse and Azure SQL DW?

Answer: Synapse: Unified analytics (SQL, Spark, Pipelines). Azure SQL DW: Dedicated SQL pool only. Synapse provides more capabilities and integration.

Question 18: How do you handle cross-database queries?

Answer: Use Synapse Serverless with external tables, or create views that reference external data. Cross-database queries are limited in Dedicated pools.

Question 19: What is the benefit of dedicated pools for BI?

Answer: Predictable performance, low latency for dashboards, integration with Power BI, and support for complex queries with large datasets.

Question 20: How do you implement data masking?

Answer: Use dynamic data masking in Synapse SQL, column-level security, and row-level security for sensitive data protection.

Question 21: What is the difference between Synapse and Databricks?

Answer: Synapse: SQL-focused data warehousing with Spark. Databricks: Spark-focused analytics with SQL. Use Synapse for SQL workloads; Databricks for Spark workloads.

Question 22: How do you handle large table loads?

Answer: Use CTAS for initial loads, MERGE for incremental updates, PolyBase for bulk loading, and partition switching for fast deletes.

Question 23: What is the benefit of Synapse Pipelines?

Answer: Built-in orchestration, integration with SQL/Spark pools, monitoring, and Git integration. Simplifies data pipeline management within Synapse.

Question 24: How do you implement data governance in Synapse?

Answer: Purview integration, row-level security, column-level security, dynamic data masking, and audit logging.

Question 25: What is the future of Synapse Analytics?

Answer: Deeper integration with Fabric, enhanced serverless capabilities, improved Spark performance, and expanded ecosystem integration.