πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

dbt with Redshift

🟒 Free Lesson

Advertisement

dbt with Redshift

Redshift Architecture

Sort Key Strategy

Distribution Strategy

Detailed Explanation


What is Redshift + dbt?

Amazon Redshift is a fully managed, petabyte-scale data warehouse that provides powerful analytics capabilities.


Core Features

FeatureDescription
Columnar StorageOptimized for analytics queries
MPPDistribute queries across nodes
Distribution KeysOptimize data distribution for joins
Sort KeysOptimize data ordering for queries
CompressionAutomatic data compression

Distribution Styles

StyleBehaviorBest For
EVENRound-robin distributionDefault / no join optimization
KEYHash-based on a columnLarge fact tables with joins
ALLCopy to every nodeSmall dimension tables

Sort Key Types

  • Compound β€” Multi-column sort with prefix matching
  • Interleaved β€” Equal-weight multi-column sort
  • Auto β€” System-managed sort key selection

Key Takeaway: Choose distribution keys based on your most frequent JOIN patterns. Use ALL distribution for small dimension tables to avoid shuffling.

Code Examples

Redshift Profile Configuration

# profiles.yml
my_profile:
  target: dev
  
  outputs:
    dev:
      type: redshift
      host: my-cluster.xxxxxxxxxxxx.us-west-2.redshift.amazonaws.com
      port: 5439
      user: my_user
      password: "{{ env_var('REDSHIFT_PASSWORD') }}"
      dbname: analytics_dev
      schema: dbt_dev
      threads: 4
      connect_timeout: 10
      ra3_node: true
    
    prod:
      type: redshift
      host: my-prod-cluster.xxxxxxxxxxxx.us-west-2.redshift.amazonaws.com
      port: 5439
      user: service_account
      password: "{{ env_var('REDSHIFT_PASSWORD') }}"
      dbname: analytics_prod
      schema: public
      threads: 8
      connect_timeout: 30
      ra3_node: true

Distribution Key Configuration

# models/marts/fct_orders.yml
version: 2

models:
  - name: fct_orders
    description: "Fact table for orders"
    
    config:
      materialized: incremental
      unique_key: order_id
      incremental_strategy: merge
      
      dist_style: key
      dist_key: customer_id
      
      sort_type: compound
      sort_key:
        - order_date
        - customer_id
      
      post_hook:
        - "vacuum sortkey order by order_date"
        - "analyze"

Sort Key Configuration

-- models/marts/fct_events.sql
{{
    config(
        materialized='incremental',
        unique_key='event_id',
        incremental_strategy='merge',
        dist_style='key',
        dist_key='user_id',
        sort_type='compound',
        sort_key=['event_date', 'user_id', 'event_type']
    )
}}

with events as (
    select * from {{ ref('stg_events') }}
),

final as (
    select
        event_id,
        user_id,
        event_type,
        event_timestamp,
        cast(event_timestamp as date) as event_date,
        event_properties,
        current_timestamp() as updated_at
    from events
)

select * from final

{% if is_incremental() %}
where event_date >= date_sub(
    (select max(event_date) from {{ this }}),
    interval 7 day
)
{% endif %}

ALL Distribution for Dimension Tables

-- models/marts/dim_customers.sql
{{
    config(
        materialized='table',
        dist_style='all',
        sort_type='compound',
        sort_key=['customer_id']
    )
}}

with customers as (
    select * from {{ ref('stg_customers') }}
),

final as (
    select
        customer_id,
        customer_name,
        email,
        segment,
        current_timestamp() as updated_at
    from customers
)

select * from final

Redshift-Specific Macros

-- macros/redshift/vacuum_table.sql
{% macro vacuum_table(table_name, sort_key=None) %}
    {% if sort_key %}
        vacuum sortkey order by {{ sort_key }} on {{ table_name }};
    {% else %}
        vacuum on {{ table_name }};
    {% endif %}
{% endmacro %}

-- macros/redshift/analyze_table.sql
{% macro analyze_table(table_name) %}
    analyze {{ table_name }};
{% endmacro %}

-- macros/redshift/grant_permissions.sql
{% macro grant_permissions(table_name, role='analytics_reader') %}
    grant select on {{ table_name }} to role {{ role }};
{% endmacro %}

Workload Management Configuration

-- macros/redshift/create_wlm_queue.sql
{% macro create_wlm_queue(queue_name, memory_percent=40, concurrency=5) %}
    -- Note: WLM configuration is typically done via AWS Console
    -- This macro provides a template for documentation
    
    /*
    WLM Queue Configuration:
    Queue: {{ queue_name }}
    Memory: {{ memory_percent }}%
    Concurrency: {{ concurrency }}
    Query Timeout: 600 seconds
    
    Use Cases:
    - dbt runs: Use dedicated queue
    - Ad-hoc queries: Use separate queue
    - ETL jobs: Use high-memory queue
    */
{% endmacro %}

Optimized Redshift Model

-- models/marts/fct_orders_optimized.sql
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        incremental_strategy='merge',
        dist_style='key',
        dist_key='customer_id',
        sort_type='compound',
        sort_key=['order_date', 'customer_id'],
        post_hook=[
            "{{ vacuum_table(this, 'order_date') }}",
            "{{ analyze_table(this) }}"
        ]
    )
}}

with orders as (
    select * from {{ ref('stg_orders') }}
),

final as (
    select
        order_id,
        customer_id,
        order_date,
        order_status,
        amount,
        current_timestamp() as updated_at
    from orders
)

select * from final

{% if is_incremental() %}
where order_date >= date_sub(
    (select max(order_date) from {{ this }}),
    interval 7 day
)
{% endif %}

Performance Metrics

MetricDescriptionTarget
Query TimeAverage query execution<10s
Vacuum TimeTime to vacuum tables<5min
Analyze TimeTime to analyze tables<1min
Storage UsagePercentage of disk used<70%
Query Queue TimeTime waiting in queue<5s

Best Practices

  1. Choose distribution keys wisely - Match join patterns
  2. Use compound sort keys - For prefix-matching queries
  3. Distribute dimension tables as ALL - For small tables
  4. Use vacuum - Reclaim space after deletes
  5. Analyze statistics - Keep query planner informed
  6. Monitor query performance - Use STL tables
  7. Use result caching - Cache frequent queries
  8. Implement workload management - Prioritize queries

See Also

⭐

Premium Content

dbt with Redshift

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert dbt Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement