Materializations in dbt
Materialization Architecture
Table vs View Performance
+-----------------------------------------------------------------------------+
| TABLE vs VIEW COMPARISON |
+-----------------------------------------------------------------------------+
| |
| TABLE MATERIALIZATION |
| +---------------------------------------------------------------------+ |
| | | |
| | Query: SELECT * FROM fct_orders WHERE order_date = '2024-01-01' | |
| | | |
| | Execution Plan: | |
| | +-------------------------------------------------------------+ | |
| | | 1. Full table scan: 100M rows | | |
| | | 2. Filter: 50K rows matched | | |
| | | 3. Return results | | |
| | | | | |
| | | Time: 2.3 seconds | | |
| | | Cost: $0.05 | | |
| | +-------------------------------------------------------------+ | |
| +---------------------------------------------------------------------+ |
| |
| VIEW MATERIALIZATION |
| +---------------------------------------------------------------------+ |
| | | |
| | Query: SELECT * FROM v_fct_orders WHERE order_date = '2024-01-01' | |
| | | |
| | Execution Plan: | |
| | +-------------------------------------------------------------+ | |
| | | 1. View expansion | | |
| | | 2. Full table scan: 100M rows | | |
| | | 3. Filter: 50K rows matched | | |
| | | 4. Return results | | |
| | | | | |
| | | Time: 3.1 seconds | | |
| | | Cost: $0.07 | | |
| | +-------------------------------------------------------------+ | |
| +---------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------------+
Ephemeral Materialization
Detailed Explanation
Materializations determine how dbt persists your models in the data warehouse. Each materialization strategy has different trade-offs for performance, cost, and maintenance.
What is Table Materialization?
Tables are physical copies of your data stored in the warehouse:
| Characteristic | Description |
|---|---|
| Storage | Full data copy stored physically |
| Query Performance | Fast (pre-computed) |
| Refresh Cost | Expensive (full rebuild) |
| Data Freshness | Independent of source data changes |
Use Cases:
- Large analytical queries
- Complex aggregations
- Performance-critical dashboards
- Data that changes infrequently
What is View Materialization?
Views are logical definitions that query underlying data:
| Characteristic | Description |
|---|---|
| Storage | No physical storage (logical only) |
| Query Performance | Slower (on-the-fly computation) |
| Refresh Cost | Zero |
| Data Freshness | Always reflects current source data |
Use Cases:
- Simple transformations
- Real-time data access
- Small datasets
- Development/testing environments
What is Incremental Materialization?
Incremental models process only new/changed data:
| Characteristic | Description |
|---|---|
| Storage | Partial data refresh |
| Query Performance | Fast incremental updates |
| Refresh Cost | Complex logic required |
| Data Freshness | Cost-effective for large datasets |
Use Cases:
- Large fact tables
- High-volume event data
- Cost-sensitive workloads
- Real-time analytics
What is Ephemeral Materialization?
Ephemeral models are compiled as CTEs into downstream models:
| Characteristic | Description |
|---|---|
| Storage | No physical storage |
| Query Performance | Compiled into parent models |
| Refresh Cost | Zero |
| Data Freshness | Reduced query complexity |
Use Cases:
- Reusable transformation logic
- Intermediate calculations
- Development/testing
- Small reference datasets
What are Materialized Views?
Materialized views combine benefits of tables and views:
| Characteristic | Description |
|---|---|
| Storage | Physical storage with automatic refresh |
| Query Performance | Like tables |
| Refresh Cost | Like views |
| Data Freshness | Database-managed updates |
Use Cases:
- Frequently accessed aggregations
- Real-time analytics
- Cost-sensitive performance
- Auto-refreshing datasets
Key Takeaway: Choose materializations based on your needsβtables for performance, views for real-time, incremental for cost efficiency, ephemeral for reuse, and materialized views for balanced performance.
Code Examples
Table Materialization
-- models/marts/dim_customers.sql
{{
config(
materialized='table',
schema='analytics',
tags=['dimension', 'core']
)
}}
with customers as (
select * from {{ ref('stg_customers') }}
),
orders as (
select * from {{ ref('stg_orders') }}
),
customer_metrics as (
select
customer_id,
count(*) as total_orders,
sum(amount) as total_revenue,
min(order_date) as first_order_date,
max(order_date) as last_order_date
from orders
group by 1
)
select
customers.customer_id,
customers.customer_name,
customers.email,
customers.segment,
coalesce(customer_metrics.total_orders, 0) as total_orders,
coalesce(customer_metrics.total_revenue, 0) as total_revenue,
customer_metrics.first_order_date,
customer_metrics.last_order_date,
current_timestamp() as updated_at
from customers
left join customer_metrics on customers.customer_id = customer_metrics.customer_id
View Materialization
-- models/staging/stg_orders.sql
{{
config(
materialized='view',
schema='staging'
)
}}
with source as (
select * from {{ source('raw', 'orders') }}
),
renamed as (
select
id as order_id,
customer_id,
status,
created_at as order_date,
updated_at as modified_date,
amount
from source
)
select * from renamed
Incremental Materialization
-- models/marts/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
partition_by={
"field": "order_date",
"data_type": "date"
},
cluster_by=['customer_id']
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
final as (
select
order_id,
customer_id,
order_date,
status,
amount,
current_timestamp() as updated_at
from orders
)
select * from final
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
Ephemeral Materialization
-- models/intermediate/int_order_metrics.sql
{{
config(
materialized='ephemeral'
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
metrics as (
select
customer_id,
count(*) as order_count,
sum(amount) as total_amount,
avg(amount) as avg_amount
from orders
group by 1
)
select * from metrics
Materialized View (Snowflake)
-- models/marts/fct_daily_metrics.sql
{{
config(
materialized='materialized_view',
schema='analytics',
auto_refresh=True,
refresh_interval='1 hour'
)
}}
with daily_orders as (
select
cast(order_date as date) as metric_date,
count(*) as order_count,
sum(amount) as total_revenue,
avg(amount) as avg_order_value
from {{ ref('stg_orders') }}
group by 1
)
select * from daily_orders
Performance Metrics
| Materialization | Query Speed | Refresh Cost | Storage | Use Case |
|---|---|---|---|---|
| Table | Excellent | High | High | Analytics |
| View | Good | None | None | Real-time |
| Incremental | Excellent | Low | Medium | Large data |
| Ephemeral | Varies | None | None | Reusable |
| Materialized View | Excellent | Low | Medium | Aggregations |
Best Practices
- Use views for staging - Always use views for 1:1 source mappings
- Use tables for marts - Final analytical datasets should be tables
- Use incremental for large facts - Cost optimization for large tables
- Use ephemeral for reuse - Shared transformation logic
- Consider materialized views - For frequently accessed aggregations
- Test materializations - Verify logic works with each type
- Monitor performance - Track query times and refresh costs
- Document decisions - Explain why each materialization was chosen
See Also
- Incremental Models β Merge strategies and performance optimization
- dbt Core Architecture β Manifest, DAG, and compilation pipeline
- Models, Seeds, and Snapshots β Core dbt components for data transformation
- Snowflake Table Types β Transient, temporary, and time-travel tables
- PySpark DataFrames β DataFrame operations and transformations