ADF Deep Dive: Mapping Data Flows, Triggers & Parameters

Advanced ADF concepts including data flows, triggers, parameters, and orchestration patterns

Data Flow Architecture

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                    ADF MAPPING DATA FLOW                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  SOURCE ──> TRANSFORM ──> SINK                                      │
│                                                                     │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐   │
│  │ Source   │───>│ Filter   │───>│ Derive   │───>│ Sink     │   │
│  │ (ADLS)   │    │ (Remove  │    │ (Add     │    │ (ADLS)   │   │
│  │          │    │  nulls)  │    │  columns)│    │          │   │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘   │
│                                                                     │
│  TRANSFORMATIONS:                                                   │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │ • Source/Sink: Read/write data                               │   │
│  │ • Filter: Row filtering                                      │   │
│  │ • Derive: Add calculated columns                             │   │
│  │ • Lookup: Join with reference data                           │   │
│  │ • Aggregate: Group by and aggregate                          │   │
│  │ • Sort: Order data                                           │   │
│  │ • Union: Combine multiple sources                            │   │
│  │ • Split: Conditional routing                                 │   │
│  │ • Exists: Check data existence                               │   │
│  │ • Surrogate Key: Generate keys                               │   │
│  │ • Pivot/Unpivot: Reshape data                                │   │
│  │ • Window: Window functions                                   │   │
│  └─────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Data Flow JSON Example

{
  "name": "df_transform_sales",
  "properties": {
    "type": "MappingDataFlow",
    "typeProperties": {
      "sources": [
        {
          "name": "source_raw",
          "type": "JsonSource",
          "dataset": { "referenceName": "ds_raw_sales" }
        }
      ],
      "sinks": [
        {
          "name": "sink_curated",
          "type": "ParquetSink",
          "dataset": { "referenceName": "ds_curated_sales" }
        }
      ],
      "transformations": [
        {
          "name": "filter_valid",
          "type": "Filter",
          "typeProperties": {
            "filterExpression": "sale_id IS NOT NULL AND amount > 0"
          }
        },
        {
          "name": "derive_total",
          "type": "DerivedColumn",
          "typeProperties": {
            "columns": [
              {
                "name": "total_amount",
                "expression": "quantity * unit_price"
              },
              {
                "name": "processed_date",
                "expression": "currentTimestamp()"
              }
            ]
          }
        }
      ]
    }
  }
}

Trigger Types Deep Dive

{
  "name": "tr_tumbling_window",
  "properties": {
    "type": "TumblingWindowTrigger",
    "typeProperties": {
      "frequency": "Hour",
      "interval": 1,
      "startTime": "2024-01-01T00:00:00Z",
      "maxConcurrency": 3,
      "retryPolicy": {
        "count": 3,
        "intervalInSeconds": 60
      }
    }
  }
}

ℹ️

Pro Tip: Use Mapping Data Flows for complex transformations that require visual debugging. For simple copy activities, use Copy Activity which is faster and more cost-effective.

Interview Questions

Q1: When would you use Data Flow over Copy Activity? A: Data Flow for complex transformations (joins, aggregations, derived columns) requiring visual debugging. Copy Activity for simple data movement, format conversion, or when performance is critical.

Q2: How do you parameterize ADF pipelines for multi-tenant scenarios? A: Use pipeline parameters, linked service parameters, and Key Vault references. Create templates with parameters for dates, paths, and connection strings. Use ARM templates for environment-specific values.

Q3: What is the difference between tumbling window and schedule triggers? A: Tumbling Window ensures exactly-once processing for time windows with backfill capability. Schedule trigger runs at fixed intervals without backfill or guaranteed execution order.