Data Quality Testing
Testing Architecture
Test Categories
Architecture Diagram
+-----------------------------------------------------------------------------+
| TEST CATEGORIES AND COVERAGE |
+-----------------------------------------------------------------------------+
| |
| +---------------------------------------------------------------------+ |
| | STRUCTURAL TESTS | |
| | | |
| | * Schema validation * Column existence * Data type checks | |
| | * Table existence * View definitions * Index verification | |
| +---------------------------------------------------------------------+ |
| | |
| v |
| +---------------------------------------------------------------------+ |
| | CONTENT TESTS | |
| | | |
| | * Uniqueness checks * Not-null validation * Accepted values | |
| | * Range validation * Pattern matching * Statistical tests | |
| +---------------------------------------------------------------------+ |
| | |
| v |
| +---------------------------------------------------------------------+ |
| | REFERENTIAL TESTS | |
| | | |
| | * Foreign key validity * Orphan detection * Relationship tests | |
| | * Cross-table sync * ID mapping * Link validation | |
| +---------------------------------------------------------------------+ |
| | |
| v |
| +---------------------------------------------------------------------+ |
| | FRESHNESS TESTS | |
| | | |
| | * Source freshness * Update frequency * SLA monitoring | |
| | * Latency checks * Delay detection * Alert thresholds | |
| +---------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------------+
Test Results Dashboard
Architecture Diagram
+-----------------------------------------------------------------------------+
| TEST RESULTS SUMMARY |
+-----------------------------------------------------------------------------+
| |
| MODEL: fct_orders |
| ===========================================================================|
| |
| +---------------------------------------------------------------------+ |
| | STRUCTURAL TESTS | |
| | β Table exists | |
| | β All columns present | |
| | β Data types correct | |
| +---------------------------------------------------------------------+ |
| |
| +---------------------------------------------------------------------+ |
| | CONTENT TESTS | |
| | β unique_order_id (0 duplicates) | |
| | β not_null_order_id (0 nulls) | |
| | β not_null_customer_id (0 nulls) | |
| | β valid_status (all values valid) | |
| | β positive_amount (142 negative values) FAILED | |
| +---------------------------------------------------------------------+ |
| |
| +---------------------------------------------------------------------+ |
| | REFERENTIAL TESTS | |
| | β fk_customer_id (all valid) | |
| | β fk_product_id (all valid) | |
| +---------------------------------------------------------------------+ |
| |
| +---------------------------------------------------------------------+ |
| | FRESHNESS TESTS | |
| | β source_freshness (last update: 2 hours ago) | |
| +---------------------------------------------------------------------+ |
| |
| SUMMARY: 12 passed, 1 failed, 0 errors |
| COVERAGE: 92% |
| |
+-----------------------------------------------------------------------------+
Detailed Explanation
Data quality testing in dbt ensures that your transformed data meets business requirements and maintains integrity across the pipeline.
What are the Quality Dimensions?
| Dimension | Description |
|---|---|
| Accuracy | Data correctly represents real-world entities |
| Completeness | All required data is present |
| Consistency | Data is uniform across systems |
| Timeliness | Data is available when needed |
| Uniqueness | No duplicate records exist |
| Validity | Data conforms to defined formats and rules |
What are the Test Types?
Structural Tests
Validate the structure of your data:
- Table existence
- Column presence
- Data type correctness
Content Tests
Validate the content of your data:
- Uniqueness constraints
- Not-null requirements
- Accepted value ranges
- Pattern matching
Referential Tests
Validate relationships between tables:
- Foreign key validity
- Orphan detection
- Cross-table consistency
Freshness Tests
Validate data timeliness:
- Source freshness checks
- Update frequency monitoring
- SLA compliance
How do Test Execution work?
dbt tests work by:
- Compiling test SQL queries
- Executing against the target warehouse
- Checking for failing rows
- Reporting pass/fail results
What are Custom Test Patterns?
Custom tests enable domain-specific validations:
- Business rule assertions
- Statistical validations
- Cross-system comparisons
- Aggregation checks
Key Takeaway: Data quality testing covers six key dimensionsβaccuracy, completeness, consistency, timeliness, uniqueness, and validityβensuring data meets business requirements.
Code Examples
Schema Test Definitions
# models/staging/_sources.yml
version: 2
sources:
- name: raw
database: raw_data
schema: public
freshness:
warn_after: {count: 6, period: hour}
error_after: {count: 24, period: hour}
tables:
- name: orders
columns:
- name: id
data_tests:
- unique
- not_null
- name: customer_id
data_tests:
- not_null
- relationships:
to: ref('stg_customers')
field: customer_id
- name: status
data_tests:
- accepted_values:
values: ['pending', 'shipped', 'delivered', 'cancelled']
- name: amount
data_tests:
- not_null
Custom Data Tests
-- tests/test_order_amount_positive.sql
-- Business rule: Order amounts must be positive
select
order_id,
amount,
customer_id,
order_date
from {{ ref('fct_orders') }}
where amount < 0
Advanced Custom Test
-- tests/test_monthly_revenue_anomaly.sql
-- Statistical test for revenue anomalies
with monthly_revenue as (
select
date_trunc('month', order_date) as month,
sum(amount) as revenue
from {{ ref('fct_orders') }}
group by 1
),
revenue_stats as (
select
avg(revenue) as avg_revenue,
stddev(revenue) as stddev_revenue
from monthly_revenue
),
anomalies as (
select
mr.month,
mr.revenue,
rs.avg_revenue,
rs.stddev_revenue,
(mr.revenue - rs.avg_revenue) / nullif(rs.stddev_revenue, 0) as z_score
from monthly_revenue mr
cross join revenue_stats rs
)
select
month,
revenue,
avg_revenue,
z_score
from anomalies
where abs(z_score) > 3 -- Flag anomalies beyond 3 standard deviations
Generic Test Macro
-- macros/schema_tests/test_accepted_range.sql
{% test accepted_range(model, column_name, min_value, max_value) %}
with validation as (
select
{{ column_name }} as field_name,
case
when {{ column_name }} < {{ min_value }} then 'below_min'
when {{ column_name }} > {{ max_value }} then 'above_max'
else 'valid'
end as validation_status
from {{ model }}
where {{ column_name }} is not null
)
select *
from validation
where validation_status != 'valid'
{% endtest %}
Test Configuration
# dbt_project.yml
name: 'my_project'
version: '1.0.0'
tests:
my_project:
+severity: error
+store_failures: true
+schema: test_results
staging:
+severity: warn
marts:
+severity: error
+fail_calc: "count(*)"
+warn_if: ">10"
+error_if: ">100"
on-run-end:
- "{{ log('Test execution completed', info=True) }}"
Freshness Configuration
# models/staging/_sources.yml
version: 2
sources:
- name: shopify
database: raw
schema: shopify
loader: fivetran
loaded_at_field: _fivetran_synced
freshness:
warn_after: {count: 6, period: hour}
error_after: {count: 24, period: hour}
tables:
- name: orders
description: "Shopify orders"
loaded_at_field: _fivetran_synced
freshness:
warn_after: {count: 1, period: hour}
error_after: {count: 4, period: hour}
Performance Metrics
| Test Type | Execution Time | Resource Usage | Coverage |
|---|---|---|---|
| unique | Low | Low | High |
| not_null | Low | Low | High |
| accepted_values | Medium | Low | Medium |
| relationships | Medium | Medium | High |
| freshness | Low | Low | High |
| Custom SQL | Variable | Variable | Variable |
Best Practices
- Test all critical columns - uniqueness, not_null, relationships
- Use relationships tests - validate foreign key integrity
- Test business rules - custom assertions for domain logic
- Store test failures - for debugging and analysis
- Use severity levels - warn vs error based on impact
- Test freshness - ensure data is current
- Use packages - dbt_expectations for advanced tests
- Document tests - explain what each test validates
- Run tests in CI/CD - catch issues before production
- Monitor test results - track quality metrics over time
See Also
- Advanced Jinja β Custom test macros and dispatch patterns
- dbt Cloud β Test scheduling and CI/CD integration
- Mesh & Data Collaboration β Data contracts and governance
- dbt Best Practices β Testing strategy and conventions