dbt Best Practices
Best Practices Architecture
Naming Conventions
Project Structure
Architecture Diagram
+-----------------------------------------------------------------------------+
| PROJECT STRUCTURE BEST PRACTICES |
+-----------------------------------------------------------------------------+
| |
| +---------------------------------------------------------------------+ |
| | RECOMMENDED STRUCTURE | |
| | | |
| | my_project/ | |
| | +-- dbt_project.yml | |
| | +-- packages.yml | |
| | +-- profiles.yml | |
| | | | |
| | +-- models/ | |
| | | +-- staging/ | |
| | | | +-- _sources.yml β-- Source definitions | |
| | | | +-- stg_customers.sql | |
| | | | +-- stg_orders.sql | |
| | | | +-- stg_products.sql | |
| | | | | |
| | | +-- intermediate/ | |
| | | | +-- int_orders_joined.sql | |
| | | | +-- int_orders_aggregated.sql | |
| | | | +-- int_orders_cleaned.sql | |
| | | | | |
| | | +-- marts/ | |
| | | +-- finance/ | |
| | | | +-- fct_orders.sql | |
| | | | +-- dim_customers.sql | |
| | | | +-- fct_revenue.sql | |
| | | | | |
| | | +-- marketing/ | |
| | | | +-- fct_campaigns.sql | |
| | | | +-- dim_campaigns.sql | |
| | | | | |
| | | +-- product/ | |
| | | +-- fct_events.sql | |
| | | +-- dim_users.sql | |
| | | | |
| | +-- seeds/ β-- CSV seed data | |
| | +-- snapshots/ β-- SCD snapshots | |
| | +-- tests/ β-- Custom data tests | |
| | +-- macros/ β-- Reusable macros | |
| | +-- analysis/ β-- Ad-hoc analysis | |
| | +-- docs/ β-- Documentation | |
| +---------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------------+
Testing Strategy
Detailed Explanation
What are dbt Best Practices?
Best practices in dbt ensure maintainable, scalable, and reliable data transformations.
Code Style
- Consistent formatting β Use consistent indentation and spacing
- Descriptive naming β Clear, meaningful names for all objects
- Modular code β Small, focused models and macros
- Documentation β Comprehensive descriptions and examples
Project Structure
| Layer | Purpose | Example |
|---|---|---|
| Staging | 1:1 with source tables | stg_orders |
| Intermediate | Business logic joins | int_orders_joined |
| Marts | Final business datasets | fct_orders |
- Group by business domain (finance, marketing, product)
- Apply patterns consistently across all layers
- Maintain clear boundaries between layers
Testing Strategy
- Test everything β Models, sources, macros
- Automate tests β Run tests in CI/CD
- Monitor results β Track test pass/fail rates
- Alert on failures β Notify on test failures
Key Takeaway: Follow the layered architecture (staging β intermediate β marts) and test at every layer. This creates a reliable, maintainable data pipeline.
Code Examples
Model Documentation Template
# models/marts/fct_orders.yml
version: 2
models:
- name: fct_orders
description: >
Fact table containing all order transactions. This is the central
fact table for the order analytics domain.
**Grain**: One row per order
**Key Business Questions**:
- What is our total revenue?
- How many orders do we process?
- What is the average order value?
**Data Source**: Shopify via Fivetran
**Refresh Frequency**: Hourly
**Owner**: Data Engineering Team
config:
tags: ['finance', 'core', 'production']
meta:
owner: data-engineering
team: analytics
cost_center: finance
columns:
- name: order_id
description: "Unique identifier for each order"
data_tests:
- unique
- not_null
meta:
system: shopify
pii: false
- name: customer_id
description: "Foreign key to dim_customers"
data_tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
meta:
pii: false
- name: order_date
description: "Date when the order was placed"
data_tests:
- not_null
meta:
format: YYYY-MM-DD
Standardized Staging Model
-- models/staging/stg_orders.sql
{{
config(
materialized='view',
schema='staging'
)
}}
with source as (
select * from {{ source('shopify', 'orders') }}
),
renamed as (
select
-- Primary keys
id as order_id,
customer_id,
-- Dimensions
status as order_status,
financial_status,
fulfillment_status,
-- Measures
{{ dbt_utils.money_snapshot(amount) }} as order_amount,
{{ dbt_utils.money_snapshot(total_discounts) }} as discount_amount,
{{ dbt_utils.money_snapshot(total_tax) }} as tax_amount,
-- Timestamps
created_at as order_created_at,
updated_at as order_updated_at,
-- Metadata
_fivetran_synced as synced_at
from source
)
select * from renamed
Standardized Fact Model
-- models/marts/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
partition_by={
"field": "order_date",
"data_type": "date"
},
cluster_by=['customer_id', 'order_status'],
tags=['finance', 'core', 'production']
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
customers as (
select * from {{ ref('dim_customers') }}
),
order_items as (
select * from {{ ref('stg_order_items') }}
),
final as (
select
-- Primary key
orders.order_id,
-- Foreign keys
orders.customer_id,
-- Dimensions
customers.customer_name,
customers.segment as customer_segment,
orders.order_status,
orders.order_date,
-- Measures
orders.order_amount,
orders.discount_amount,
orders.tax_amount,
count(order_items.item_id) as item_count,
-- Timestamps
orders.order_created_at,
orders.order_updated_at,
current_timestamp() as dbt_updated_at
from orders
left join customers on orders.customer_id = customers.customer_id
left join order_items on orders.order_id = order_items.order_id
group by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
)
select * from final
{% if is_incremental() %}
where dbt_updated_at > (select max(dbt_updated_at) from {{ this }})
{% endif %}
Testing Template
# models/marts/fct_orders_tests.yml
version: 2
models:
- name: fct_orders
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- order_id
- dbt_utils.accepted_range:
min_value: 0
max_value: 1000000
column_name: order_amount
columns:
- name: order_id
data_tests:
- unique
- not_null
- name: customer_id
data_tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: order_amount
data_tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
max_value: 1000000
- name: order_date
data_tests:
- not_null
- dbt_utils.expression_is_true:
expression: "order_date <= current_date()"
Macro Template
-- macros/generate_generic_test.sql
{% macro generate_generic_test(test_name, model, column_name, config) %}
{% set test_sql %}
select
'{{ test_name }}' as test_name,
'{{ model }}' as model_name,
'{{ column_name }}' as column_name,
count(*) as failures
from {{ model }}
where {{ column_name }} is null
{% endset %}
{% set result = run_query(test_sql) %}
{{ return(result) }}
{% endmacro %}
Performance Metrics
| Metric | Description | Target |
|---|---|---|
| Code Quality | Linting and style checks | 100% pass |
| Test Coverage | Percentage of models tested | >90% |
| Documentation Coverage | Percentage documented | >95% |
| Build Success Rate | Percentage of successful builds | >99% |
| Average Build Time | Time to build all models | <30min |
Best Practices
- Follow naming conventions - Consistent, descriptive names
- Use layered architecture - staging -> intermediate -> marts
- Test everything - Models, sources, macros
- Document comprehensively - Descriptions, lineage, examples
- Use version control - All code in Git
- Implement CI/CD - Automated testing and deployment
- Monitor performance - Track metrics continuously
- Review code regularly - Peer reviews and audits
See Also
- Advanced Jinja β Custom macro patterns and dispatch
- Data Quality Tests β Testing strategy and conventions
- dbt Cloud β CI/CD and job scheduling
- Performance Tuning β Optimization strategies