ELT Patterns: ADLS → Synapse Serverless → Dedicated

Load-first architecture with Synapse Serverless for exploration and Dedicated for production analytics

ELT Architecture

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                    ELT ARCHITECTURE PATTERN                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  EXTRACT               LOAD                    TRANSFORM            │
│  ┌──────────┐        ┌──────────┐            ┌──────────────┐     │
│  │ Source   │───────>│ ADLS     │───────────>│ Synapse      │     │
│  │ Systems │        │ Gen2     │            │ Serverless   │     │
│  │          │        │          │            │              │     │
│  │ • SQL DB │        │ Raw Zone │            │ External     │     │
│  │ • Files  │        │ (Parquet)│            │ Tables       │     │
│  │ • APIs   │        │          │            │              │     │
│  └──────────┘        └──────────┘            └──────┬───────┘     │
│                                                      │              │
│                                                      │ SQL Views    │
│                                                      │ CTAS         │
│                                                      │ Stored Procs  │
│                                                      ▼              │
│                                          ┌──────────────────┐      │
│                                          │ Synapse          │      │
│                                          │ Dedicated Pool   │      │
│                                          │                  │      │
│                                          │ • Fact Tables    │      │
│                                          │ • Dim Tables     │      │
│                                          │ • Aggregations   │      │
│                                          └────────┬─────────┘      │
│                                                    │                │
│  SERVING                MONITORING                │                │
│                                                    ▼                │
│  ┌──────────┐        ┌──────────────┐    ┌──────────────┐        │
│  │ Power BI │<───────│ Azure        │    │ Power BI     │        │
│  │          │        │ Monitor      │    │ Dataset      │        │
│  └──────────┘        └──────────────┘    └──────────────┘        │
└─────────────────────────────────────────────────────────────────────┘

Synapse Serverless External Tables

-- Create external data source
CREATE EXTERNAL DATA SOURCE [ADLSDataSource]
WITH (
    LOCATION = 'https://stdatalake001.dfs.core.windows.net',
    CREDENTIAL = [ManagedIdentityCredential]
);

-- Create external file format
CREATE EXTERNAL FILE FORMAT [ParquetFormat]
WITH (
    FORMAT_TYPE = PARQUET,
    DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);

-- Create external table for raw sales data
CREATE EXTERNAL TABLE [raw].[Sales]
WITH (
    LOCATION = 'raw/sales/',
    DATA_SOURCE = [ADLSDataSource],
    FILE_FORMAT = [ParquetFormat]
)
AS
SELECT * FROM OPENROWSET(
    BULK 'raw/sales/**/*.parquet',
    FORMAT = 'PARQUET'
) WITH (
    sale_id BIGINT,
    customer_id INT,
    product_id INT,
    sale_date DATE,
    quantity INT,
    unit_price DECIMAL(18,2),
    total_amount DECIMAL(18,2),
    region VARCHAR(50)
) AS [Sales];

-- Create view for curated data
CREATE VIEW [curated].[vw_FactSales]
AS
SELECT
    s.sale_id,
    s.customer_id,
    s.product_id,
    s.sale_date,
    s.quantity,
    s.unit_price,
    s.total_amount,
    s.region,
    c.customer_name,
    c.customer_segment,
    p.product_name,
    p.category
FROM [raw].[Sales] s
LEFT JOIN [curated].[DimCustomers] c ON s.customer_id = c.customer_id
LEFT JOIN [curated].[DimProducts] p ON s.product_id = p.product_id;

Synapse Dedicated Pool Loading

-- CTAS to load into dedicated pool
CREATE TABLE [dbo].[FactSales]
WITH
(
    DISTRIBUTION = HASH(sale_date),
    CLUSTERED COLUMNSTORE INDEX,
    PARTITION = (sale_date RANGE RIGHT FOR VALUES
        ('2024-01-01', '2024-02-01', '2024-03-01'))
)
AS
SELECT * FROM [curated].[vw_FactSales];

-- Create statistics
CREATE STATISTICS STAT_FactSales_sale_date
ON [dbo].[FactSales](sale_date);

-- Incremental load pattern
MERGE INTO [dbo].[FactSales] AS target
USING [curated].[vw_FactSales] AS source
ON target.sale_id = source.sale_id
WHEN MATCHED AND source.sale_date > target.last_updated THEN
    UPDATE SET
        target.quantity = source.quantity,
        target.unit_price = source.unit_price,
        target.total_amount = source.total_amount,
        target.last_updated = GETDATE()
WHEN NOT MATCHED THEN
    INSERT (sale_id, customer_id, product_id, sale_date,
            quantity, unit_price, total_amount, region,
            customer_name, product_name, last_updated)
    VALUES (source.sale_id, source.customer_id, source.product_id,
            source.sale_date, source.quantity, source.unit_price,
            source.total_amount, source.region, source.customer_name,
            source.product_name, GETDATE());

ℹ️

Pro Tip: Use Synapse Serverless for data exploration and prototyping. Once queries are validated, move to Dedicated Pool for production with optimized distribution and indexing.

Interview Questions

Q1: Explain the difference between external tables and views in Synapse Serverless. A: External tables point to files in ADLS with schema definition. Views are virtual tables defined by SQL queries. External tables are for raw data; views for curated/transformed data using joins and calculations.

Q2: How do you implement slowly changing dimensions (SCD) in ELT? A: Use MERGE statement for SCD Type 1 (overwrite). For SCD Type 2, create history tables with effective dates. Use Synapse Serverless for staging and Dedicated for final dimension tables.

Q3: What are the cost implications of Synapse Serverless vs Dedicated? A: Serverless charges per TB scanned (pay-per-use). Dedicated charges per DWU-hour (reserved). Use Serverless for exploration (low cost); Dedicated for production (predictable cost with reserved capacity).