dbt with Snowflake
Snowflake Architecture
Time Travel Architecture
Architecture Diagram
+-----------------------------------------------------------------------------+
| SNOWFLAKE TIME TRAVEL |
+-----------------------------------------------------------------------------+
| |
| +---------------------------------------------------------------------+ |
| | TIME TRAVEL CAPABILITIES | |
| | | |
| | +--------------+ +--------------+ +--------------------------+ | |
| | | UNDROP | | QUERY | | CLONE | | |
| | | | | HISTORY | | | | |
| | | * Restore | | * Historical | | * Zero-copy | | |
| | | tables | | queries | | * Instant | | |
| | | * Restore | | * Point-in- | | * Metadata | | |
| | | schemas | | time | | only | | |
| | +--------------+ +--------------+ +--------------------------+ | |
| +---------------------------------------------------------------------+ |
| | |
| v |
| +---------------------------------------------------------------------+ |
| | TIME TRAVEL USAGE | |
| | | |
| | -- Query historical data | |
| | SELECT * FROM table AT (TIMESTAMP => '2024-01-01 12:00:00') | |
| | | |
| | -- Query before specific change | |
| | SELECT * FROM table BEFORE (STATEMENT => 'abc123') | |
| | | |
| | -- Restore dropped table | |
| | UNDROP TABLE table_name | |
| | | |
| | -- Clone table with zero copy | |
| | CREATE TABLE table_clone CLONE table_name | |
| +---------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------------+
Warehouse Optimization
Detailed Explanation
What is Snowflake + dbt?
Snowflake is a cloud data platform that provides separation of storage and compute, enabling flexible scaling and cost management.
Core Features
| Feature | Description |
|---|---|
| Warehouse Management | Dynamic compute allocation |
| Time Travel | Historical data access |
| Zero-Copy Cloning | Instant table copies |
| Result Caching | Automatic query result caching |
| Auto-Scaling | Dynamic cluster scaling |
Warehouse Configuration
Snowflake warehouses control compute resources:
- Size β X-Small to 6X-Large
- Scaling β Single to multi-cluster
- Auto-suspend β Automatic suspension after inactivity
- Auto-resume β Automatic resume on query submission
Time Travel Retention
| Edition | Retention Period |
|---|---|
| Standard | 1 day |
| Enterprise | 90 days |
| Business Critical | 90 days |
Key Takeaway: Snowflake's separation of storage and compute means you can scale each independently β use auto-suspend to avoid paying for idle warehouses.
Code Examples
Snowflake Profile Configuration
# profiles.yml
my_profile:
target: dev
outputs:
dev:
type: snowflake
account: my_account
user: my_user
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: TRANSFORMER
warehouse: COMPUTE_WH
database: ANALYTICS_DEV
schema: "dbt_{{ env_var('DBT_USER') }}"
client_session_keep_alive: false
query_tag: "dbt_dev"
connect_retries: 3
connect_timeout: 10
retry_on_database_errors: true
retry_all: false
prod:
type: snowflake
account: my_account
user: service_account
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: TRANSFORMER
warehouse: ANALYTICS_WH
database: ANALYTICS_PROD
schema: public
client_session_keep_alive: true
query_tag: "dbt_production"
Warehouse Configuration
-- macros/snowflake/create_warehouse.sql
{% macro create_warehouse(warehouse_name, size='X-Small', min_clusters=1, max_clusters=1) %}
create warehouse if not exists {{ warehouse_name }}
warehouse_size = '{{ size }}'
min_cluster_count = {{ min_clusters }}
max_cluster_count = {{ max_clusters }}
auto_suspend = 60
auto_resume = true
initially_suspended = true
comment = 'Warehouse created by dbt';
{% endmacro %}
Time Travel Query
-- models/marts/fct_orders_historical.sql
{{
config(
materialized='view'
)
}}
-- Query data from 7 days ago
with current_data as (
select * from {{ ref('fct_orders') }}
),
historical_data as (
select * from {{ ref('fct_orders') }}
at (offset => -60*60*24*7) -- 7 days ago
),
comparison as (
select
current_data.order_id,
current_data.amount as current_amount,
historical_data.amount as historical_amount,
current_data.amount - historical_data.amount as amount_change
from current_data
left join historical_data
on current_data.order_id = historical_data.order_id
)
select * from comparison
Zero-Copy Clone
-- macros/snowflake/clone_table.sql
{% macro clone_table(source_table, target_table) %}
create table {{ target_table }} clone {{ source_table }};
{% endmacro %}
-- Usage in model
{{
config(
materialized='table',
post_hook=[
"{{ clone_table(this, this ~ '_backup') }}"
]
)
}}
Resource Monitor
-- macros/snowflake/create_resource_monitor.sql
{% macro create_resource_monitor(monitor_name, credit_quota=100, notify_emails=['data-eng@company.com']) %}
create resource monitor if not exists {{ monitor_name }}
with credit_quota = {{ credit_quota }}
notify_users = ({{ notify_emails | join(', ') }})
notify_on = (80, 95, 100)
suspend = immediately
suspend_immediately = true;
{% endmacro %}
Optimized Snowflake Model
-- models/marts/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
cluster_by=['customer_id', 'order_status'],
post_hook=[
"alter table {{ this }} clustering key recluster"
]
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
final as (
select
order_id,
customer_id,
order_date,
order_status,
amount,
current_timestamp() as updated_at
from orders
)
select * from final
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
Performance Metrics
| Metric | Description | Target |
|---|---|---|
| Query Time | Average query execution | <10s |
| Credit Usage | Credits consumed per run | <50 |
| Warehouse Uptime | Percentage of time active | <50% |
| Cache Hit Rate | Query result cache hits | >80% |
| Auto-Suspend | Time to suspend idle | 60s |
Best Practices
- Right-size warehouses - Match size to workload
- Use auto-scaling - Handle queue spikes
- Set auto-suspend - Reduce idle costs
- Leverage time travel - For debugging and auditing
- Use zero-copy cloning - For testing and development
- Monitor credit usage - Track costs
- Use query history - Analyze query patterns
- Implement resource monitors - Control costs
See Also
- Performance Tuning β General optimization strategies
- dbt BigQuery β BigQuery partitioning and clustering
- dbt Redshift β Redshift distribution and sort keys
- dbt Best Practices β Project structure and patterns