ELT Patterns: ADLS → Synapse Serverless → Dedicated
Load-first architecture with Synapse Serverless for exploration and Dedicated for production analytics
ELT Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ ELT ARCHITECTURE PATTERN │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ EXTRACT LOAD TRANSFORM │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Source │───────>│ ADLS │───────────>│ Synapse │ │
│ │ Systems │ │ Gen2 │ │ Serverless │ │
│ │ │ │ │ │ │ │
│ │ • SQL DB │ │ Raw Zone │ │ External │ │
│ │ • Files │ │ (Parquet)│ │ Tables │ │
│ │ • APIs │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────┬───────┘ │
│ │ │
│ │ SQL Views │
│ │ CTAS │
│ │ Stored Procs │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Synapse │ │
│ │ Dedicated Pool │ │
│ │ │ │
│ │ • Fact Tables │ │
│ │ • Dim Tables │ │
│ │ • Aggregations │ │
│ └────────┬─────────┘ │
│ │ │
│ SERVING MONITORING │ │
│ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Power BI │<───────│ Azure │ │ Power BI │ │
│ │ │ │ Monitor │ │ Dataset │ │
│ └──────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Synapse Serverless External Tables
-- Create external data source
CREATE EXTERNAL DATA SOURCE [ADLSDataSource]
WITH (
LOCATION = 'https://stdatalake001.dfs.core.windows.net',
CREDENTIAL = [ManagedIdentityCredential]
);
-- Create external file format
CREATE EXTERNAL FILE FORMAT [ParquetFormat]
WITH (
FORMAT_TYPE = PARQUET,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);
-- Create external table for raw sales data
CREATE EXTERNAL TABLE [raw].[Sales]
WITH (
LOCATION = 'raw/sales/',
DATA_SOURCE = [ADLSDataSource],
FILE_FORMAT = [ParquetFormat]
)
AS
SELECT * FROM OPENROWSET(
BULK 'raw/sales/**/*.parquet',
FORMAT = 'PARQUET'
) WITH (
sale_id BIGINT,
customer_id INT,
product_id INT,
sale_date DATE,
quantity INT,
unit_price DECIMAL(18,2),
total_amount DECIMAL(18,2),
region VARCHAR(50)
) AS [Sales];
-- Create view for curated data
CREATE VIEW [curated].[vw_FactSales]
AS
SELECT
s.sale_id,
s.customer_id,
s.product_id,
s.sale_date,
s.quantity,
s.unit_price,
s.total_amount,
s.region,
c.customer_name,
c.customer_segment,
p.product_name,
p.category
FROM [raw].[Sales] s
LEFT JOIN [curated].[DimCustomers] c ON s.customer_id = c.customer_id
LEFT JOIN [curated].[DimProducts] p ON s.product_id = p.product_id;
Synapse Dedicated Pool Loading
-- CTAS to load into dedicated pool
CREATE TABLE [dbo].[FactSales]
WITH
(
DISTRIBUTION = HASH(sale_date),
CLUSTERED COLUMNSTORE INDEX,
PARTITION = (sale_date RANGE RIGHT FOR VALUES
('2024-01-01', '2024-02-01', '2024-03-01'))
)
AS
SELECT * FROM [curated].[vw_FactSales];
-- Create statistics
CREATE STATISTICS STAT_FactSales_sale_date
ON [dbo].[FactSales](sale_date);
-- Incremental load pattern
MERGE INTO [dbo].[FactSales] AS target
USING [curated].[vw_FactSales] AS source
ON target.sale_id = source.sale_id
WHEN MATCHED AND source.sale_date > target.last_updated THEN
UPDATE SET
target.quantity = source.quantity,
target.unit_price = source.unit_price,
target.total_amount = source.total_amount,
target.last_updated = GETDATE()
WHEN NOT MATCHED THEN
INSERT (sale_id, customer_id, product_id, sale_date,
quantity, unit_price, total_amount, region,
customer_name, product_name, last_updated)
VALUES (source.sale_id, source.customer_id, source.product_id,
source.sale_date, source.quantity, source.unit_price,
source.total_amount, source.region, source.customer_name,
source.product_name, GETDATE());
ℹ️
Pro Tip: Use Synapse Serverless for data exploration and prototyping. Once queries are validated, move to Dedicated Pool for production with optimized distribution and indexing.
Interview Questions
Q1: Explain the difference between external tables and views in Synapse Serverless. A: External tables point to files in ADLS with schema definition. Views are virtual tables defined by SQL queries. External tables are for raw data; views for curated/transformed data using joins and calculations.
Q2: How do you implement slowly changing dimensions (SCD) in ELT? A: Use MERGE statement for SCD Type 1 (overwrite). For SCD Type 2, create history tables with effective dates. Use Synapse Serverless for staging and Dedicated for final dimension tables.
Q3: What are the cost implications of Synapse Serverless vs Dedicated? A: Serverless charges per TB scanned (pay-per-use). Dedicated charges per DWU-hour (reserved). Use Serverless for exploration (low cost); Dedicated for production (predictable cost with reserved capacity).