Data Quality Testing

Testing Architecture

Test Categories

Architecture Diagram

+-----------------------------------------------------------------------------+
|                     TEST CATEGORIES AND COVERAGE                            |
+-----------------------------------------------------------------------------+
|                                                                             |
|  +---------------------------------------------------------------------+    |
|  |                    STRUCTURAL TESTS                                  |   |
|  |                                                                     |    |
|  |  * Schema validation    * Column existence    * Data type checks   |     |
|  |  * Table existence      * View definitions    * Index verification |     |
|  +---------------------------------------------------------------------+    |
|                              |                                              |
|                              v                                              |
|  +---------------------------------------------------------------------+    |
|  |                    CONTENT TESTS                                    |    |
|  |                                                                     |    |
|  |  * Uniqueness checks    * Not-null validation * Accepted values   |      |
|  |  * Range validation     * Pattern matching    * Statistical tests |      |
|  +---------------------------------------------------------------------+    |
|                              |                                              |
|                              v                                              |
|  +---------------------------------------------------------------------+    |
|  |                    REFERENTIAL TESTS                                 |   |
|  |                                                                     |    |
|  |  * Foreign key validity * Orphan detection    * Relationship tests |     |
|  |  * Cross-table sync     * ID mapping          * Link validation    |     |
|  +---------------------------------------------------------------------+    |
|                              |                                              |
|                              v                                              |
|  +---------------------------------------------------------------------+    |
|  |                    FRESHNESS TESTS                                   |   |
|  |                                                                     |    |
|  |  * Source freshness     * Update frequency    * SLA monitoring     |     |
|  |  * Latency checks       * Delay detection     * Alert thresholds   |     |
|  +---------------------------------------------------------------------+    |
|                                                                             |
+-----------------------------------------------------------------------------+

Test Results Dashboard

Architecture Diagram

+-----------------------------------------------------------------------------+
|                     TEST RESULTS SUMMARY                                    |
+-----------------------------------------------------------------------------+
|                                                                             |
|  MODEL: fct_orders                                                          |
|  ===========================================================================|
|                                                                             |
|  +---------------------------------------------------------------------+    |
|  |  STRUCTURAL TESTS                                                  |     |
|  |  ✓ Table exists                                                     |    |
|  |  ✓ All columns present                                              |    |
|  |  ✓ Data types correct                                               |    |
|  +---------------------------------------------------------------------+    |
|                                                                             |
|  +---------------------------------------------------------------------+    |
|  |  CONTENT TESTS                                                     |     |
|  |  ✓ unique_order_id          (0 duplicates)                         |     |
|  |  ✓ not_null_order_id        (0 nulls)                              |     |
|  |  ✓ not_null_customer_id     (0 nulls)                              |     |
|  |  ✓ valid_status             (all values valid)                     |     |
|  |  ✓ positive_amount          (142 negative values)  FAILED         |      |
|  +---------------------------------------------------------------------+    |
|                                                                             |
|  +---------------------------------------------------------------------+    |
|  |  REFERENTIAL TESTS                                                 |     |
|  |  ✓ fk_customer_id           (all valid)                            |     |
|  |  ✓ fk_product_id            (all valid)                            |     |
|  +---------------------------------------------------------------------+    |
|                                                                             |
|  +---------------------------------------------------------------------+    |
|  |  FRESHNESS TESTS                                                   |     |
|  |  ✓ source_freshness         (last update: 2 hours ago)             |     |
|  +---------------------------------------------------------------------+    |
|                                                                             |
|  SUMMARY: 12 passed, 1 failed, 0 errors                                     |
|  COVERAGE: 92%                                                              |
|                                                                             |
+-----------------------------------------------------------------------------+

Detailed Explanation

Data quality testing in dbt ensures that your transformed data meets business requirements and maintains integrity across the pipeline.

What are the Quality Dimensions?

Dimension	Description
Accuracy	Data correctly represents real-world entities
Completeness	All required data is present
Consistency	Data is uniform across systems
Timeliness	Data is available when needed
Uniqueness	No duplicate records exist
Validity	Data conforms to defined formats and rules

What are the Test Types?

Structural Tests

Validate the structure of your data:

Table existence
Column presence
Data type correctness

Content Tests

Validate the content of your data:

Uniqueness constraints
Not-null requirements
Accepted value ranges
Pattern matching

Referential Tests

Validate relationships between tables:

Foreign key validity
Orphan detection
Cross-table consistency

Freshness Tests

Validate data timeliness:

Source freshness checks
Update frequency monitoring
SLA compliance

How do Test Execution work?

dbt tests work by:

Compiling test SQL queries
Executing against the target warehouse
Checking for failing rows
Reporting pass/fail results

What are Custom Test Patterns?

Custom tests enable domain-specific validations:

Business rule assertions
Statistical validations
Cross-system comparisons
Aggregation checks

Key Takeaway: Data quality testing covers six key dimensions—accuracy, completeness, consistency, timeliness, uniqueness, and validity—ensuring data meets business requirements.

Code Examples

Schema Test Definitions

# models/staging/_sources.yml
version: 2

sources:
  - name: raw
    database: raw_data
    schema: public
    
    freshness:
      warn_after: {count: 6, period: hour}
      error_after: {count: 24, period: hour}
    
    tables:
      - name: orders
        columns:
          - name: id
            data_tests:
              - unique
              - not_null
          
          - name: customer_id
            data_tests:
              - not_null
              - relationships:
                  to: ref('stg_customers')
                  field: customer_id
          
          - name: status
            data_tests:
              - accepted_values:
                  values: ['pending', 'shipped', 'delivered', 'cancelled']
          
          - name: amount
            data_tests:
              - not_null

Custom Data Tests

-- tests/test_order_amount_positive.sql
-- Business rule: Order amounts must be positive

select
    order_id,
    amount,
    customer_id,
    order_date
from {{ ref('fct_orders') }}
where amount < 0

Advanced Custom Test

-- tests/test_monthly_revenue_anomaly.sql
-- Statistical test for revenue anomalies

with monthly_revenue as (
    select
        date_trunc('month', order_date) as month,
        sum(amount) as revenue
    from {{ ref('fct_orders') }}
    group by 1
),

revenue_stats as (
    select
        avg(revenue) as avg_revenue,
        stddev(revenue) as stddev_revenue
    from monthly_revenue
),

anomalies as (
    select
        mr.month,
        mr.revenue,
        rs.avg_revenue,
        rs.stddev_revenue,
        (mr.revenue - rs.avg_revenue) / nullif(rs.stddev_revenue, 0) as z_score
    from monthly_revenue mr
    cross join revenue_stats rs
)

select
    month,
    revenue,
    avg_revenue,
    z_score
from anomalies
where abs(z_score) > 3  -- Flag anomalies beyond 3 standard deviations

Generic Test Macro

-- macros/schema_tests/test_accepted_range.sql
{% test accepted_range(model, column_name, min_value, max_value) %}

with validation as (
    select
        {{ column_name }} as field_name,
        case
            when {{ column_name }} < {{ min_value }} then 'below_min'
            when {{ column_name }} > {{ max_value }} then 'above_max'
            else 'valid'
        end as validation_status
    from {{ model }}
    where {{ column_name }} is not null
)

select *
from validation
where validation_status != 'valid'

{% endtest %}

Test Configuration

# dbt_project.yml
name: 'my_project'
version: '1.0.0'

tests:
  my_project:
    +severity: error
    +store_failures: true
    +schema: test_results
    
    staging:
      +severity: warn
    
    marts:
      +severity: error
      +fail_calc: "count(*)"
      +warn_if: ">10"
      +error_if: ">100"

on-run-end:
  - "{{ log('Test execution completed', info=True) }}"

Freshness Configuration

# models/staging/_sources.yml
version: 2

sources:
  - name: shopify
    database: raw
    schema: shopify
    loader: fivetran
    loaded_at_field: _fivetran_synced
    
    freshness:
      warn_after: {count: 6, period: hour}
      error_after: {count: 24, period: hour}
    
    tables:
      - name: orders
        description: "Shopify orders"
        
        loaded_at_field: _fivetran_synced
        
        freshness:
          warn_after: {count: 1, period: hour}
          error_after: {count: 4, period: hour}

Performance Metrics

Test Type	Execution Time	Resource Usage	Coverage
unique	Low	Low	High
not_null	Low	Low	High
accepted_values	Medium	Low	Medium
relationships	Medium	Medium	High
freshness	Low	Low	High
Custom SQL	Variable	Variable	Variable

Best Practices

Test all critical columns - uniqueness, not_null, relationships
Use relationships tests - validate foreign key integrity
Test business rules - custom assertions for domain logic
Store test failures - for debugging and analysis
Use severity levels - warn vs error based on impact
Test freshness - ensure data is current
Use packages - dbt_expectations for advanced tests
Document tests - explain what each test validates
Run tests in CI/CD - catch issues before production
Monitor test results - track quality metrics over time

Data Quality Testing

Data Quality Testing

Testing Architecture

Test Categories

Test Results Dashboard

Detailed Explanation

What are the Quality Dimensions?

What are the Test Types?

Structural Tests

Content Tests

Referential Tests

Freshness Tests

How do Test Execution work?

What are Custom Test Patterns?

Code Examples

Schema Test Definitions

Custom Data Tests

Advanced Custom Test

Generic Test Macro

Test Configuration

Freshness Configuration

Performance Metrics

Best Practices

See Also

Premium Content

Need Expert dbt Help?