πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Incremental Models

🟒 Free Lesson

Advertisement

Incremental Models

Incremental Architecture

Merge Strategy Flow

Partition Strategy

Detailed Explanation

Incremental models are one of dbt's most powerful features for optimizing performance and reducing costs. They allow you to process only new or changed data rather than rebuilding entire tables.


How do Incremental Models work?

  1. First Run: The model executes as a full table build
  2. Subsequent Runs: Only new/changed data is processed
  3. Merge Logic: New data is merged into existing table
  4. Partition Pruning: Queries are optimized with partition filters

What are the Incremental Strategies?

StrategyUse CaseSQL Pattern
MergeUpsert operationsmerge into target using source...
AppendAdd new rows onlyinsert into target select...
Delete+InsertReplace partitionsdelete from target... insert into target...

Merge Strategy: The most common strategy for upsert operations:

merge into target using source on target.id = source.id
when matched then update set ...
when not matched then insert ...

Append Strategy: Simple append of new rows without updates:

insert into target select * from source
where source.id not in (select id from target)

Delete+Insert Strategy: Replace specific partitions or segments:

delete from target where date = '2024-01-01'
insert into target select * from source where date = '2024-01-01'

How does Partition Management work?

Partitioning is crucial for incremental performance:

  1. Date-based partitioning: Most common, partition by date
  2. Timestamp-based partitioning: Higher granularity for large datasets
  3. Integer-based partitioning: For numeric keys
  4. Clustered partitioning: Combined partitioning and clustering

What is Watermark Detection?

dbt uses watermark detection to identify new data:

{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

What are the Performance Considerations?

  1. Partition pruning: Always filter on partition columns
  2. Cluster alignment: Cluster by frequently filtered columns
  3. Batch processing: Process data in appropriate batch sizes
  4. Resource allocation: Use appropriate warehouse sizes

Key Takeaway: Incremental models optimize performance by processing only new or changed data, with strategies like merge, append, and delete+insert to handle different use cases.

Code Examples

Basic Incremental Model

-- models/marts/fct_orders.sql
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        incremental_strategy='merge'
    )
}}

with orders as (
    select * from {{ ref('stg_orders') }}
),

final as (
    select
        order_id,
        customer_id,
        order_date,
        status,
        amount,
        current_timestamp() as updated_at
    from orders
)

select * from final

{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

Partitioned Incremental Model (BigQuery)

-- models/marts/fct_events.sql
{{
    config(
        materialized='incremental',
        unique_key='event_id',
        partition_by={
            "field": "event_date",
            "data_type": "date",
            "granularity": "day"
        },
        cluster_by=['user_id', 'event_type'],
        incremental_strategy='merge'
    )
}}

with events as (
    select * from {{ ref('stg_events') }}
),

final as (
    select
        event_id,
        user_id,
        event_type,
        event_timestamp,
        cast(event_timestamp as date) as event_date,
        event_properties,
        current_timestamp() as updated_at
    from events
)

select * from final

{% if is_incremental() %}
where event_date >= date_sub(
    (select max(event_date) from {{ this }}),
    interval 3 day
)
{% endif %}

Append-Only Incremental Model

-- models/marts/fct_events_append.sql
{{
    config(
        materialized='incremental',
        incremental_strategy='append',
        unique_key='event_id'
    )
}}

with events as (
    select
        event_id,
        user_id,
        event_type,
        event_timestamp,
        event_properties
    from {{ ref('stg_events') }}
)

select * from events

{% if is_incremental() %}
where event_id not in (select event_id from {{ this }})
{% endif %}

Delete+Insert Strategy

-- models/marts/fct_daily_metrics.sql
{{
    config(
        materialized='incremental',
        incremental_strategy='delete+insert',
        unique_key=['metric_date', 'metric_name']
    )
}}

with daily_metrics as (
    select
        cast(metric_timestamp as date) as metric_date,
        metric_name,
        metric_value,
        current_timestamp() as updated_at
    from {{ ref('stg_metrics') }}
)

select * from daily_metrics

{% if is_incremental() %}
where metric_date >= date_sub(
    (select max(metric_date) from {{ this }}),
    interval 7 day
)
{% endif %}

Advanced Incremental with Macro

-- models/marts/fct_orders_advanced.sql
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        incremental_strategy='merge',
        partition_by={
            "field": "order_date",
            "data_type": "date",
            "granularity": "day"
        },
        cluster_by=['customer_id', 'order_status']
    )
}}

{% macro get_incremental_filter() %}
    {% if is_incremental() %}
        where order_date >= date_sub(
            (select max(order_date) from {{ this }}),
            interval {{ var('incremental_lookback_days', 7) }} day
        )
    {% endif %}
{% endmacro %}

with orders as (
    select * from {{ ref('stg_orders') }}
),

customers as (
    select * from {{ ref('dim_customers') }}
),

final as (
    select
        orders.order_id,
        orders.customer_id,
        customers.customer_name,
        customers.segment,
        orders.order_date,
        orders.status as order_status,
        orders.amount,
        current_timestamp() as updated_at
    from orders
    left join customers on orders.customer_id = customers.customer_id
    {{ get_incremental_filter() }}
)

select * from final

Performance Metrics

MetricFull RefreshIncrementalImprovement
Processing Time100%10-30%70-90%
Cost100%20-40%60-80%
Data Scanned100%5-20%80-95%
Warehouse SizeLargeMedium50% reduction
ParallelismLowHigh2-3x improvement

Best Practices

  1. Always use unique_key for merge strategies
  2. Partition large tables for efficient pruning
  3. Cluster by frequently filtered columns
  4. Use appropriate incremental lookback for late-arriving data
  5. Test incremental logic with both full and incremental runs
  6. Monitor partition counts to avoid excessive partitions
  7. Use delete+insert for simple date-based replacements
  8. Consider resource costs when choosing strategies

See Also

⭐

Premium Content

Incremental Models

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert dbt Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement