πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

ADF Deep Dive: Mapping Data Flows, Triggers & Parameters

Azure Data EngineeringADF Pipeline Deep Dive⭐ Premium

Advertisement

ADF Deep Dive: Mapping Data Flows, Triggers & Parameters

Advanced ADF concepts including data flows, triggers, parameters, and orchestration patterns

Data Flow Architecture

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ADF MAPPING DATA FLOW                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  SOURCE ──> TRANSFORM ──> SINK                                      β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Source   │───>β”‚ Filter   │───>β”‚ Derive   │───>β”‚ Sink     β”‚   β”‚
β”‚  β”‚ (ADLS)   β”‚    β”‚ (Remove  β”‚    β”‚ (Add     β”‚    β”‚ (ADLS)   β”‚   β”‚
β”‚  β”‚          β”‚    β”‚  nulls)  β”‚    β”‚  columns)β”‚    β”‚          β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  TRANSFORMATIONS:                                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ β€’ Source/Sink: Read/write data                               β”‚   β”‚
β”‚  β”‚ β€’ Filter: Row filtering                                      β”‚   β”‚
β”‚  β”‚ β€’ Derive: Add calculated columns                             β”‚   β”‚
β”‚  β”‚ β€’ Lookup: Join with reference data                           β”‚   β”‚
β”‚  β”‚ β€’ Aggregate: Group by and aggregate                          β”‚   β”‚
β”‚  β”‚ β€’ Sort: Order data                                           β”‚   β”‚
β”‚  β”‚ β€’ Union: Combine multiple sources                            β”‚   β”‚
β”‚  β”‚ β€’ Split: Conditional routing                                 β”‚   β”‚
β”‚  β”‚ β€’ Exists: Check data existence                               β”‚   β”‚
β”‚  β”‚ β€’ Surrogate Key: Generate keys                               β”‚   β”‚
β”‚  β”‚ β€’ Pivot/Unpivot: Reshape data                                β”‚   β”‚
β”‚  β”‚ β€’ Window: Window functions                                   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow JSON Example

{
  "name": "df_transform_sales",
  "properties": {
    "type": "MappingDataFlow",
    "typeProperties": {
      "sources": [
        {
          "name": "source_raw",
          "type": "JsonSource",
          "dataset": { "referenceName": "ds_raw_sales" }
        }
      ],
      "sinks": [
        {
          "name": "sink_curated",
          "type": "ParquetSink",
          "dataset": { "referenceName": "ds_curated_sales" }
        }
      ],
      "transformations": [
        {
          "name": "filter_valid",
          "type": "Filter",
          "typeProperties": {
            "filterExpression": "sale_id IS NOT NULL AND amount > 0"
          }
        },
        {
          "name": "derive_total",
          "type": "DerivedColumn",
          "typeProperties": {
            "columns": [
              {
                "name": "total_amount",
                "expression": "quantity * unit_price"
              },
              {
                "name": "processed_date",
                "expression": "currentTimestamp()"
              }
            ]
          }
        }
      ]
    }
  }
}

Trigger Types Deep Dive

{
  "name": "tr_tumbling_window",
  "properties": {
    "type": "TumblingWindowTrigger",
    "typeProperties": {
      "frequency": "Hour",
      "interval": 1,
      "startTime": "2024-01-01T00:00:00Z",
      "maxConcurrency": 3,
      "retryPolicy": {
        "count": 3,
        "intervalInSeconds": 60
      }
    }
  }
}

ℹ️

Pro Tip: Use Mapping Data Flows for complex transformations that require visual debugging. For simple copy activities, use Copy Activity which is faster and more cost-effective.

Interview Questions

Q1: When would you use Data Flow over Copy Activity? A: Data Flow for complex transformations (joins, aggregations, derived columns) requiring visual debugging. Copy Activity for simple data movement, format conversion, or when performance is critical.

Q2: How do you parameterize ADF pipelines for multi-tenant scenarios? A: Use pipeline parameters, linked service parameters, and Key Vault references. Create templates with parameters for dates, paths, and connection strings. Use ARM templates for environment-specific values.

Q3: What is the difference between tumbling window and schedule triggers? A: Tumbling Window ensures exactly-once processing for time windows with backfill capability. Schedule trigger runs at fixed intervals without backfill or guaranteed execution order.

Advertisement