πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

dbt with BigQuery

🟒 Free Lesson

Advertisement

dbt with BigQuery

BigQuery Architecture

Partitioning Strategy

Clustering Strategy

Detailed Explanation


What is BigQuery + dbt?

BigQuery is a serverless, highly scalable data warehouse that integrates seamlessly with dbt for data transformation.


Core Features

FeatureDescription
Native IntegrationDirect SQL execution
PartitioningTime and integer-based partitioning
ClusteringColumn-based data organization
Materialized ViewsPre-computed aggregations
Slot ManagementCompute resource allocation

Partitioning Strategies

BigQuery supports several partitioning strategies:

  • Day-partitioned β€” Partition by date column
  • Hour-partitioned β€” Partition by timestamp (hour granularity)
  • Integer-range β€” Partition by integer column
  • Ingestion-time β€” Partition by data insertion time

Clustering

Clustering organizes data within partitions for better performance:

  • Up to 4 clustering columns
  • Automatic re-clustering
  • Improved query performance
  • Reduced costs

Cost Optimization

Pricing FactorRate
Storage$0.02/GB/month
Query$5/TB scanned
Streaming$0.01/200MB
SlotsVariable pricing

Key Takeaway: Partitioning and clustering together can reduce query costs by up to 99% and improve performance by 95%.

Code Examples

BigQuery Profile Configuration

# profiles.yml
my_profile:
  target: dev
  
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: my-gcp-project
      dataset: dbt_dev
      threads: 8
      timeout_seconds: 300
      location: US
      priority: interactive
      retries: 1
      maximum_bytes_billed: 1000000000  # 1GB limit
    
    prod:
      type: bigquery
      method: service-account
      project: my-gcp-project
      dataset: analytics_prod
      threads: 16
      timeout_seconds: 600
      location: US
      priority: batch
      retries: 3
      maximum_bytes_billed: 100000000000  # 100GB limit

Partitioned Model

-- models/marts/fct_orders.sql
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        partition_by={
            "field": "order_date",
            "data_type": "date",
            "granularity": "day"
        },
        cluster_by=['customer_id', 'order_status', 'product_category'],
        incremental_strategy='merge'
    )
}}

with orders as (
    select * from {{ ref('stg_orders') }}
),

final as (
    select
        order_id,
        customer_id,
        order_date,
        order_status,
        product_category,
        amount,
        current_timestamp() as updated_at
    from orders
)

select * from final

{% if is_incremental() %}
where order_date >= date_sub(
    (select max(order_date) from {{ this }}),
    interval 7 day
)
{% endif %}

Materialized View

-- models/marts/mv_daily_revenue.sql
{{
    config(
        materialized='materialized_view',
        partition_by={
            "field": "order_date",
            "data_type": "date"
        },
        enable_refresh=true,
        refresh_interval_minutes=60
    )
}}

with daily_revenue as (
    select
        order_date,
        product_category,
        sum(amount) as total_revenue,
        count(*) as order_count
    from {{ ref('fct_orders') }}
    group by 1, 2
)

select * from daily_revenue

External Table

-- models/staging/stg_external_data.sql
{{
    config(
        materialized='view'
    )
}}

with external_data as (
    select * from {{ source('external', 'gcs_data') }}
)

select * from external_data

BigQuery-Specific Macros

-- macros/bigquery/safe_divide.sql
{% macro safe_divide(numerator, denominator) %}
    safe_divide({{ numerator }}, {{ denominator }})
{% endmacro %}

-- macros/bigquery/array_agg.sql
{% macro array_agg(column_name) %}
    array_agg({{ column_name }} ignore nulls)
{% endmacro %}

-- macros/bigquery/struct.sql
{% macro make_struct(columns) %}
    struct({{ columns | join(', ') }})
{% endmacro %}

Incremental with BigQuery Optimizations

-- models/marts/fct_events.sql
{{
    config(
        materialized='incremental',
        unique_key='event_id',
        partition_by={
            "field": "event_timestamp",
            "data_type": "timestamp",
            "granularity": "hour"
        },
        cluster_by=['user_id', 'event_type'],
        incremental_strategy='merge',
        post_hook=[
            "analyze table {{ this }} compute statistics for all columns"
        ]
    )
}}

with events as (
    select * from {{ ref('stg_events') }}
),

final as (
    select
        event_id,
        user_id,
        event_type,
        event_timestamp,
        event_properties,
        current_timestamp() as updated_at
    from events
)

select * from final

{% if is_incremental() %}
where event_timestamp >= timestamp_sub(
    (select max(event_timestamp) from {{ this }}),
    interval 3 day
)
{% endif %}

Performance Metrics

MetricWithout OptimizationWith OptimizationImprovement
Query Time10s0.5s95% faster
Cost5.00∣5.00 |0.0199.8% cheaper
Storage100GB100GBSame
Scan100M rows274K rows99.7% less

Best Practices

  1. Always partition large tables by date
  2. Cluster by query patterns - filter/join columns
  3. Use incremental models - minimize data scanned
  4. Partition expiration - manage storage costs
  5. Use materialized views - for frequent aggregations
  6. Monitor slot usage - optimize compute resources
  7. Use dry runs - estimate query costs
  8. Leverage caching - reduce repeated queries

See Also

⭐

Premium Content

dbt with BigQuery

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert dbt Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement