Incremental Models

Incremental Architecture

Merge Strategy Flow

Partition Strategy

Detailed Explanation

Incremental models are one of dbt's most powerful features for optimizing performance and reducing costs. They allow you to process only new or changed data rather than rebuilding entire tables.

How do Incremental Models work?

First Run: The model executes as a full table build
Subsequent Runs: Only new/changed data is processed
Merge Logic: New data is merged into existing table
Partition Pruning: Queries are optimized with partition filters

What are the Incremental Strategies?

Strategy	Use Case	SQL Pattern
Merge	Upsert operations	`merge into target using source...`
Append	Add new rows only	`insert into target select...`
Delete+Insert	Replace partitions	`delete from target... insert into target...`

Merge Strategy: The most common strategy for upsert operations:

merge into target using source on target.id = source.id
when matched then update set ...
when not matched then insert ...

Append Strategy: Simple append of new rows without updates:

insert into target select * from source
where source.id not in (select id from target)

Delete+Insert Strategy: Replace specific partitions or segments:

delete from target where date = '2024-01-01'
insert into target select * from source where date = '2024-01-01'

How does Partition Management work?

Partitioning is crucial for incremental performance:

Date-based partitioning: Most common, partition by date
Timestamp-based partitioning: Higher granularity for large datasets
Integer-based partitioning: For numeric keys
Clustered partitioning: Combined partitioning and clustering

What is Watermark Detection?

dbt uses watermark detection to identify new data:

{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

What are the Performance Considerations?

Partition pruning: Always filter on partition columns
Cluster alignment: Cluster by frequently filtered columns
Batch processing: Process data in appropriate batch sizes
Resource allocation: Use appropriate warehouse sizes

Key Takeaway: Incremental models optimize performance by processing only new or changed data, with strategies like merge, append, and delete+insert to handle different use cases.

Code Examples

Basic Incremental Model

-- models/marts/fct_orders.sql
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        incremental_strategy='merge'
    )
}}

with orders as (
    select * from {{ ref('stg_orders') }}
),

final as (
    select
        order_id,
        customer_id,
        order_date,
        status,
        amount,
        current_timestamp() as updated_at
    from orders
)

select * from final

{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

Partitioned Incremental Model (BigQuery)

-- models/marts/fct_events.sql
{{
    config(
        materialized='incremental',
        unique_key='event_id',
        partition_by={
            "field": "event_date",
            "data_type": "date",
            "granularity": "day"
        },
        cluster_by=['user_id', 'event_type'],
        incremental_strategy='merge'
    )
}}

with events as (
    select * from {{ ref('stg_events') }}
),

final as (
    select
        event_id,
        user_id,
        event_type,
        event_timestamp,
        cast(event_timestamp as date) as event_date,
        event_properties,
        current_timestamp() as updated_at
    from events
)

select * from final

{% if is_incremental() %}
where event_date >= date_sub(
    (select max(event_date) from {{ this }}),
    interval 3 day
)
{% endif %}

Append-Only Incremental Model

-- models/marts/fct_events_append.sql
{{
    config(
        materialized='incremental',
        incremental_strategy='append',
        unique_key='event_id'
    )
}}

with events as (
    select
        event_id,
        user_id,
        event_type,
        event_timestamp,
        event_properties
    from {{ ref('stg_events') }}
)

select * from events

{% if is_incremental() %}
where event_id not in (select event_id from {{ this }})
{% endif %}

Delete+Insert Strategy

-- models/marts/fct_daily_metrics.sql
{{
    config(
        materialized='incremental',
        incremental_strategy='delete+insert',
        unique_key=['metric_date', 'metric_name']
    )
}}

with daily_metrics as (
    select
        cast(metric_timestamp as date) as metric_date,
        metric_name,
        metric_value,
        current_timestamp() as updated_at
    from {{ ref('stg_metrics') }}
)

select * from daily_metrics

{% if is_incremental() %}
where metric_date >= date_sub(
    (select max(metric_date) from {{ this }}),
    interval 7 day
)
{% endif %}

Advanced Incremental with Macro

-- models/marts/fct_orders_advanced.sql
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        incremental_strategy='merge',
        partition_by={
            "field": "order_date",
            "data_type": "date",
            "granularity": "day"
        },
        cluster_by=['customer_id', 'order_status']
    )
}}

{% macro get_incremental_filter() %}
    {% if is_incremental() %}
        where order_date >= date_sub(
            (select max(order_date) from {{ this }}),
            interval {{ var('incremental_lookback_days', 7) }} day
        )
    {% endif %}
{% endmacro %}

with orders as (
    select * from {{ ref('stg_orders') }}
),

customers as (
    select * from {{ ref('dim_customers') }}
),

final as (
    select
        orders.order_id,
        orders.customer_id,
        customers.customer_name,
        customers.segment,
        orders.order_date,
        orders.status as order_status,
        orders.amount,
        current_timestamp() as updated_at
    from orders
    left join customers on orders.customer_id = customers.customer_id
    {{ get_incremental_filter() }}
)

select * from final

Performance Metrics

Metric	Full Refresh	Incremental	Improvement
Processing Time	100%	10-30%	70-90%
Cost	100%	20-40%	60-80%
Data Scanned	100%	5-20%	80-95%
Warehouse Size	Large	Medium	50% reduction
Parallelism	Low	High	2-3x improvement

Best Practices

Always use unique_key for merge strategies
Partition large tables for efficient pruning
Cluster by frequently filtered columns
Use appropriate incremental lookback for late-arriving data
Test incremental logic with both full and incremental runs
Monitor partition counts to avoid excessive partitions
Use delete+insert for simple date-based replacements
Consider resource costs when choosing strategies

Incremental Models

Incremental Models

Incremental Architecture

Merge Strategy Flow

Partition Strategy

Detailed Explanation

How do Incremental Models work?

What are the Incremental Strategies?

How does Partition Management work?

What is Watermark Detection?

What are the Performance Considerations?

Code Examples

Basic Incremental Model

Partitioned Incremental Model (BigQuery)

Append-Only Incremental Model

Delete+Insert Strategy

Advanced Incremental with Macro

Performance Metrics

Best Practices

See Also

Premium Content

Need Expert dbt Help?