dbt with Redshift

Redshift Architecture

Sort Key Strategy

Distribution Strategy

Detailed Explanation

What is Redshift + dbt?

Amazon Redshift is a fully managed, petabyte-scale data warehouse that provides powerful analytics capabilities.

Core Features

Feature	Description
Columnar Storage	Optimized for analytics queries
MPP	Distribute queries across nodes
Distribution Keys	Optimize data distribution for joins
Sort Keys	Optimize data ordering for queries
Compression	Automatic data compression

Distribution Styles

Style	Behavior	Best For
EVEN	Round-robin distribution	Default / no join optimization
KEY	Hash-based on a column	Large fact tables with joins
ALL	Copy to every node	Small dimension tables

Sort Key Types

Compound — Multi-column sort with prefix matching
Interleaved — Equal-weight multi-column sort
Auto — System-managed sort key selection

Key Takeaway: Choose distribution keys based on your most frequent JOIN patterns. Use ALL distribution for small dimension tables to avoid shuffling.

Code Examples

Redshift Profile Configuration

# profiles.yml
my_profile:
  target: dev
  
  outputs:
    dev:
      type: redshift
      host: my-cluster.xxxxxxxxxxxx.us-west-2.redshift.amazonaws.com
      port: 5439
      user: my_user
      password: "{{ env_var('REDSHIFT_PASSWORD') }}"
      dbname: analytics_dev
      schema: dbt_dev
      threads: 4
      connect_timeout: 10
      ra3_node: true
    
    prod:
      type: redshift
      host: my-prod-cluster.xxxxxxxxxxxx.us-west-2.redshift.amazonaws.com
      port: 5439
      user: service_account
      password: "{{ env_var('REDSHIFT_PASSWORD') }}"
      dbname: analytics_prod
      schema: public
      threads: 8
      connect_timeout: 30
      ra3_node: true

Distribution Key Configuration

# models/marts/fct_orders.yml
version: 2

models:
  - name: fct_orders
    description: "Fact table for orders"
    
    config:
      materialized: incremental
      unique_key: order_id
      incremental_strategy: merge
      
      dist_style: key
      dist_key: customer_id
      
      sort_type: compound
      sort_key:
        - order_date
        - customer_id
      
      post_hook:
        - "vacuum sortkey order by order_date"
        - "analyze"

Sort Key Configuration

-- models/marts/fct_events.sql
{{
    config(
        materialized='incremental',
        unique_key='event_id',
        incremental_strategy='merge',
        dist_style='key',
        dist_key='user_id',
        sort_type='compound',
        sort_key=['event_date', 'user_id', 'event_type']
    )
}}

with events as (
    select * from {{ ref('stg_events') }}
),

final as (
    select
        event_id,
        user_id,
        event_type,
        event_timestamp,
        cast(event_timestamp as date) as event_date,
        event_properties,
        current_timestamp() as updated_at
    from events
)

select * from final

{% if is_incremental() %}
where event_date >= date_sub(
    (select max(event_date) from {{ this }}),
    interval 7 day
)
{% endif %}

ALL Distribution for Dimension Tables

-- models/marts/dim_customers.sql
{{
    config(
        materialized='table',
        dist_style='all',
        sort_type='compound',
        sort_key=['customer_id']
    )
}}

with customers as (
    select * from {{ ref('stg_customers') }}
),

final as (
    select
        customer_id,
        customer_name,
        email,
        segment,
        current_timestamp() as updated_at
    from customers
)

select * from final

Redshift-Specific Macros

-- macros/redshift/vacuum_table.sql
{% macro vacuum_table(table_name, sort_key=None) %}
    {% if sort_key %}
        vacuum sortkey order by {{ sort_key }} on {{ table_name }};
    {% else %}
        vacuum on {{ table_name }};
    {% endif %}
{% endmacro %}

-- macros/redshift/analyze_table.sql
{% macro analyze_table(table_name) %}
    analyze {{ table_name }};
{% endmacro %}

-- macros/redshift/grant_permissions.sql
{% macro grant_permissions(table_name, role='analytics_reader') %}
    grant select on {{ table_name }} to role {{ role }};
{% endmacro %}

Workload Management Configuration

-- macros/redshift/create_wlm_queue.sql
{% macro create_wlm_queue(queue_name, memory_percent=40, concurrency=5) %}
    -- Note: WLM configuration is typically done via AWS Console
    -- This macro provides a template for documentation
    
    /*
    WLM Queue Configuration:
    Queue: {{ queue_name }}
    Memory: {{ memory_percent }}%
    Concurrency: {{ concurrency }}
    Query Timeout: 600 seconds
    
    Use Cases:
    - dbt runs: Use dedicated queue
    - Ad-hoc queries: Use separate queue
    - ETL jobs: Use high-memory queue
    */
{% endmacro %}

Optimized Redshift Model

-- models/marts/fct_orders_optimized.sql
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        incremental_strategy='merge',
        dist_style='key',
        dist_key='customer_id',
        sort_type='compound',
        sort_key=['order_date', 'customer_id'],
        post_hook=[
            "{{ vacuum_table(this, 'order_date') }}",
            "{{ analyze_table(this) }}"
        ]
    )
}}

with orders as (
    select * from {{ ref('stg_orders') }}
),

final as (
    select
        order_id,
        customer_id,
        order_date,
        order_status,
        amount,
        current_timestamp() as updated_at
    from orders
)

select * from final

{% if is_incremental() %}
where order_date >= date_sub(
    (select max(order_date) from {{ this }}),
    interval 7 day
)
{% endif %}

Performance Metrics

Metric	Description	Target
Query Time	Average query execution	<10s
Vacuum Time	Time to vacuum tables	<5min
Analyze Time	Time to analyze tables	<1min
Storage Usage	Percentage of disk used	<70%
Query Queue Time	Time waiting in queue	<5s

Best Practices

Choose distribution keys wisely - Match join patterns
Use compound sort keys - For prefix-matching queries
Distribute dimension tables as ALL - For small tables
Use vacuum - Reclaim space after deletes
Analyze statistics - Keep query planner informed
Monitor query performance - Use STL tables
Use result caching - Cache frequent queries
Implement workload management - Prioritize queries

dbt with Redshift

dbt with Redshift

Redshift Architecture

Sort Key Strategy

Distribution Strategy

Detailed Explanation

What is Redshift + dbt?

Core Features

Distribution Styles

Sort Key Types

Code Examples

Redshift Profile Configuration

Distribution Key Configuration

Sort Key Configuration

ALL Distribution for Dimension Tables

Redshift-Specific Macros

Workload Management Configuration

Optimized Redshift Model

Performance Metrics

Best Practices

See Also

Premium Content

Need Expert dbt Help?