ADF Deep Dive: Mapping Data Flows, Triggers & Parameters
Advanced ADF concepts including data flows, triggers, parameters, and orchestration patterns
Data Flow Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ADF MAPPING DATA FLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β SOURCE ββ> TRANSFORM ββ> SINK β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Source ββββ>β Filter ββββ>β Derive ββββ>β Sink β β
β β (ADLS) β β (Remove β β (Add β β (ADLS) β β
β β β β nulls) β β columns)β β β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β
β TRANSFORMATIONS: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Source/Sink: Read/write data β β
β β β’ Filter: Row filtering β β
β β β’ Derive: Add calculated columns β β
β β β’ Lookup: Join with reference data β β
β β β’ Aggregate: Group by and aggregate β β
β β β’ Sort: Order data β β
β β β’ Union: Combine multiple sources β β
β β β’ Split: Conditional routing β β
β β β’ Exists: Check data existence β β
β β β’ Surrogate Key: Generate keys β β
β β β’ Pivot/Unpivot: Reshape data β β
β β β’ Window: Window functions β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data Flow JSON Example
{
"name": "df_transform_sales",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"name": "source_raw",
"type": "JsonSource",
"dataset": { "referenceName": "ds_raw_sales" }
}
],
"sinks": [
{
"name": "sink_curated",
"type": "ParquetSink",
"dataset": { "referenceName": "ds_curated_sales" }
}
],
"transformations": [
{
"name": "filter_valid",
"type": "Filter",
"typeProperties": {
"filterExpression": "sale_id IS NOT NULL AND amount > 0"
}
},
{
"name": "derive_total",
"type": "DerivedColumn",
"typeProperties": {
"columns": [
{
"name": "total_amount",
"expression": "quantity * unit_price"
},
{
"name": "processed_date",
"expression": "currentTimestamp()"
}
]
}
}
]
}
}
}
Trigger Types Deep Dive
{
"name": "tr_tumbling_window",
"properties": {
"type": "TumblingWindowTrigger",
"typeProperties": {
"frequency": "Hour",
"interval": 1,
"startTime": "2024-01-01T00:00:00Z",
"maxConcurrency": 3,
"retryPolicy": {
"count": 3,
"intervalInSeconds": 60
}
}
}
}
βΉοΈ
Pro Tip: Use Mapping Data Flows for complex transformations that require visual debugging. For simple copy activities, use Copy Activity which is faster and more cost-effective.
Interview Questions
Q1: When would you use Data Flow over Copy Activity? A: Data Flow for complex transformations (joins, aggregations, derived columns) requiring visual debugging. Copy Activity for simple data movement, format conversion, or when performance is critical.
Q2: How do you parameterize ADF pipelines for multi-tenant scenarios? A: Use pipeline parameters, linked service parameters, and Key Vault references. Create templates with parameters for dates, paths, and connection strings. Use ARM templates for environment-specific values.
Q3: What is the difference between tumbling window and schedule triggers? A: Tumbling Window ensures exactly-once processing for time windows with backfill capability. Schedule trigger runs at fixed intervals without backfill or guaranteed execution order.