πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Synapse Performance: Distribution, Indexing & Caching

Azure Data EngineeringSynapse Performance⭐ Premium

Advertisement

Synapse Performance: Distribution, Indexing & Caching

Optimize Synapse Dedicated SQL Pool performance with distribution, indexing, and caching strategies

Performance Optimization Layers

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SYNAPSE PERFORMANCE OPTIMIZATION                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  LAYER 1: DISTRIBUTION                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ HASH: Distribute by join/group-by key (even distribution)    β”‚   β”‚
β”‚  β”‚ ROUND ROBIN: Random distribution (fast load, no skew)        β”‚   β”‚
β”‚  β”‚ REPLICATE: Full copy on each node (small tables <2GB)        β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  LAYER 2: INDEXING                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ CLUSTERED COLUMNSTORE INDEX: Default, best for analytics     β”‚   β”‚
β”‚  β”‚ CLUSTERED INDEX: B-tree, best for point lookups              β”‚   β”‚
β”‚  β”‚ NONCLUSTERED INDEX: Secondary lookup paths                    β”‚   β”‚
β”‚  β”‚ NONCLUSTERED COLUMNSTORE INDEX: Columnar for secondary       β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  LAYER 3: STATISTICS & CACHING                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ AUTO CREATE STATISTICS: Enabled by default                   β”‚   β”‚
β”‚  β”‚ MANUALLY CREATED STATISTICS: For complex queries             β”‚   β”‚
β”‚  β”‚ RESULT SET CACHING: Cache query results for reuse            β”‚   β”‚
β”‚  β”‚ MATERIALIZED VIEWS: Pre-computed aggregations                β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  LAYER 4: QUERY TUNING                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ USE CTAS for data loading (not INSERT INTO)                  β”‚   β”‚
β”‚  β”‚ USE SET statements for session-level optimization            β”‚   β”‚
β”‚  β”‚ AVOID cross-database queries                                 β”‚   β”‚
β”‚  β”‚ USE partition switching for large deletes                    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Distribution Analysis

-- Check distribution skew
DBCC PDW_SHOWSPACEUSED('dbo.FactSales');

-- Output:
-- index_name | distribution_id | row_count | size_in_bytes
-- -----------+-----------------+-----------+-------------
-- CCI_1      | 0               | 1000000   | 50000000
-- CCI_1      | 1               | 1000000   | 50000000
-- CCI_1      | 2               | 1000001   | 50000010

-- Check statistics
sys.dm_pdw_exec_stats
WHERE command LIKE '%SELECT%'

-- Monitor query performance
SELECT
    query_id,
    sql_text,
    start_time,
    end_time,
    total_elapsed_time / 1000 AS duration_seconds,
    row_count,
    spilled_thread_count
FROM sys.dm_pdw_exec_requests
WHERE status = 'Completed'
ORDER BY total_elapsed_time DESC;

Index Optimization

-- Create CCI for fact table
CREATE CLUSTERED COLUMNSTORE INDEX CCI_FactSales
ON dbo.FactSales;

-- Create CCI for fact table
CREATE CLUSTERED COLUMNSTORE INDEX CCI_FactSales
ON dbo.FactSales;

-- Create nonclustered index for lookup queries
CREATE NONCLUSTERED INDEX IX_FactSales_CustomerKey
ON dbo.FactSales(customer_key)
INCLUDE (sale_date, total_amount);

-- Create materialized view for common aggregations
CREATE MATERIALIZED VIEW MV_DailySalesSummary
WITH (DISTRIBUTION = HASH(sale_date))
AS
SELECT
    sale_date,
    COUNT(*) AS transaction_count,
    SUM(total_amount) AS daily_revenue
FROM dbo.FactSales
GROUP BY sale_date;

SET Statements for Optimization

-- Enable result set caching
SET RESULT_SET_CACHING ON;

-- Enable statistics profiling
SET STATISTICS PROFILE ON;

-- Adjust DOP (Degree of Parallelism)
SET statement_dop = 4;

-- Enable bulk insert optimization
SET BULK_INSERT_BATCH_SIZE = 100000;

ℹ️

Pro Tip: Use EXPLAIN to analyze query plans. Look for distributed moves (shuffles) and partition scans as performance bottlenecks. Optimize distribution keys to minimize data movement.

Interview Questions

Q1: How do you identify and fix data skew in Synapse? A: Use DBCC PDW_SHOWSPACEUSED to check distribution sizes. If skew exists, change distribution key to a more evenly distributed column, or use Round Robin for staging tables.

Q2: What is the difference between CCI and clustered index? A: CCI (Columnstore) is optimal for analytics (scans, aggregations). Clustered (B-tree) is optimal for point lookups and range scans. Use CCI for fact tables; clustered for dimension tables.

Q3: How does result set caching improve performance? A: Result set caching stores query results in memory for 5 minutes (configurable). Subsequent identical queries return cached results instantly, reducing compute usage and query time.

Advertisement