Mesh and Data Collaboration
Mesh Architecture
Data Contract Architecture
Governance Flow
Detailed Explanation
Data mesh is an architectural pattern that decentralized data ownership to domain-specific teams. dbt provides tools to implement data mesh with multi-project architectures, data contracts, and governance features.
What is Domain-Based Organization?
In a data mesh architecture:
- Domain Teams: Own their data products
- Self-Serve Platform: dbt Cloud as the platform
- Data as a Product: Published interfaces
- Federated Governance: Shared standards
What are Cross-Project References?
dbt enables cross-project references for data mesh:
| Reference Type | Syntax | Purpose |
|---|---|---|
| Project references | {{ ref('project', 'model') }} | Reference models in other projects |
| Package references | Shared semantic models | Reuse semantic definitions |
| Metric stores | Centralized metric definitions | Consistent metrics across projects |
What are Data Contracts?
Data contracts define the interface between producers and consumers:
- Schema contracts: Column names, types, constraints
- Freshness contracts: SLA for data availability
- Quality contracts: Acceptable data quality levels
- Cost contracts: Resource usage limits
What are the Governance Features?
- Access Control: RBAC and schema-level permissions
- Data Classification: PII, sensitive, public
- Lineage Tracking: End-to-end data lineage
- Audit Logging: Track all data access and changes
Key Takeaway: Data mesh in dbt enables decentralized data ownership through domain-based organization, cross-project references, data contracts, and governance features.
Code Examples
Cross-Project Reference
-- models/marts/fct_company_metrics.sql
{{
config(
materialized='incremental',
unique_key='company_id'
)
}}
with revenue as (
select * from {{ ref('finance', 'fct_revenue') }}
),
marketing as (
select * from {{ ref('marketing', 'fct_campaigns') }}
),
final as (
select
revenue.company_id,
revenue.company_name,
sum(revenue.amount) as total_revenue,
sum(marketing.spend) as total_marketing_spend,
sum(revenue.amount) - sum(marketing.spend) as profit,
current_timestamp() as updated_at
from revenue
left join marketing on revenue.company_id = marketing.company_id
group by 1, 2
)
select * from final
Data Contract Definition
# contracts/fct_orders_contract.yml
version: 2
contracts:
- name: fct_orders_contract
description: "Data contract for orders fact table"
model: ref('fct_orders')
schema:
columns:
- name: order_id
type: integer
description: "Unique order identifier"
constraints:
- not_null
- unique
- name: customer_id
type: integer
description: "Foreign key to customers"
constraints:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: amount
type: decimal(18,2)
description: "Order amount in USD"
constraints:
- not_null
- positive
- name: order_date
type: date
description: "Date order was placed"
constraints:
- not_null
- recent:
period: 30 days
freshness:
max_delay: 4 hours
check_frequency: hourly
alert_on_delay: true
quality:
- type: uniqueness
column: order_id
- type: completeness
columns: [order_id, customer_id, amount, order_date]
- type: consistency
check: "amount > 0"
threshold: 99.9
access:
- team: analytics
permissions: [read]
- team: finance
permissions: [read, write]
- team: data-science
permissions: [read]
Governance Configuration
# governance/access_control.yml
version: 2
access_control:
roles:
- name: data_reader
description: "Read-only access to data"
permissions:
- model:read
- source:read
- metric:read
grant_to:
- team: analytics
- team: business
- name: data_writer
description: "Read and write access to data"
permissions:
- model:read
- model:write
- source:read
- source:write
grant_to:
- team: data-engineering
- name: data_admin
description: "Full access to data platform"
permissions:
- "*"
grant_to:
- team: platform-team
schema_access:
- schema: raw
permissions:
- role: data_writer
access: full
- role: data_reader
access: none
- schema: staging
permissions:
- role: data_writer
access: full
- role: data_reader
access: read
- schema: analytics
permissions:
- role: data_writer
access: full
- role: data_reader
access: read
data_classification:
- level: public
description: "Non-sensitive data"
mask: false
- level: internal
description: "Internal business data"
mask: false
access: [data_reader, data_writer]
- level: confidential
description: "Sensitive business data"
mask: true
access: [data_writer]
masking_policy: "hash"
- level: restricted
description: "PII and regulated data"
mask: true
access: [data_admin]
masking_policy: "full_mask"
Semantic Layer Configuration
# semantic/company_metrics.yml
version: 2
semantic_models:
- name: company_metrics
description: "Unified company metrics"
model: ref('fct_company_metrics')
entities:
- name: company_id
type: primary
expr: company_id
dimensions:
- name: metric_date
type: time
type_params:
time_granularity: day
- name: company_name
type: categorical
expr: company_name
measures:
- name: total_revenue
agg: sum
expr: total_revenue
- name: total_marketing_spend
agg: sum
expr: total_marketing_spend
- name: profit
agg: sum
expr: profit
metrics:
- name: revenue
type: simple
type_params:
measure: total_revenue
- name: marketing_spend
type: simple
type_params:
measure: total_marketing_spend
- name: profit_margin
type: derived
type_params:
expr: "profit / revenue"
exposures:
- name: company_dashboard
type: dashboard
description: "Executive company dashboard"
depends_on:
- ref('fct_company_metrics')
owner:
name: Executive Team
email: exec@company.com
Performance Metrics
| Metric | Description | Target |
|---|---|---|
| Cross-project ref time | Time to resolve cross-project refs | <5s |
| Contract validation | Time to validate contracts | <30s |
| Governance audit | Time to run governance checks | <1min |
| Discovery search | Time to search data catalog | <2s |
| Lineage generation | Time to generate lineage graph | <10s |
Best Practices
- Define clear domain boundaries - Each team owns their data
- Use cross-project refs - Enable data mesh architecture
- Implement data contracts - Define clear interfaces
- Govern access - Use RBAC and schema permissions
- Classify data - Mark PII and sensitive data
- Track lineage - End-to-end data lineage
- Document everything - Clear descriptions and examples
- Monitor usage - Track data access and consumption
See Also
- dbt Cloud β Cloud platform features and job scheduling
- Advanced Jinja β Cross-project reference patterns
- Data Quality Tests β Data contract quality checks
- dbt Best Practices β Governance and documentation patterns