Exposures and Semantic Layer
Semantic Layer Architecture
Metrics Flow
Architecture Diagram
+-----------------------------------------------------------------------------+
| METRICS DEFINITION AND USAGE |
+-----------------------------------------------------------------------------+
| |
| SEMANTIC MODEL |
| +---------------------------------------------------------------------+ |
| | name: orders_semantic | |
| | defaults: | |
| | agg_time_dimension: order_date | |
| | entities: | |
| | - name: order_id | |
| | type: primary | |
| | - name: customer_id | |
| | type: foreign | |
| | dimensions: | |
| | - name: order_date | |
| | type: time | |
| | type_params: | |
| | time_granularity: day | |
| | measures: | |
| | - name: order_total | |
| | type: sum | |
| | expr: amount | |
| +---------------------------------------------------------------------+ |
| | |
| v |
| METRICS DEFINITION |
| +---------------------------------------------------------------------+ |
| | - name: total_revenue | |
| | type: simple | |
| | type_params: | |
| | measure: order_total | |
| | description: "Total revenue from all orders" | |
| | | |
| | - name: avg_order_value | |
| | type: derived | |
| | type_params: | |
| | expr: total_revenue / order_count | |
| | description: "Average order value" | |
| +---------------------------------------------------------------------+ |
| | |
| v |
| QUERY INTERFACE |
| +---------------------------------------------------------------------+ |
| | metrics: [total_revenue, order_count] | |
| | group_by: [order_date__day, customer_segment] | |
| | where: "order_date >= '2024-01-01'" | |
| | | |
| | Result: | |
| | +------------+---------+----------+------------+ | |
| | | Date | Segment | Revenue | Orders | | |
| | +------------+---------+----------+------------+ | |
| | | 2024-01-01 | Premium | $150,000 | 1,500 | | |
| | | 2024-01-01 | Standard| $85,000 | 2,100 | | |
| | +------------+---------+----------+------------+ | |
| +---------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------------+
Exposure Dependencies
Detailed Explanation
The dbt semantic layer provides a standardized way to define metrics, dimensions, and business logic that can be consumed by downstream applications.
What are Semantic Models?
Semantic models are the foundation of the semantic layer. They define:
- Entities: Keys that identify records (primary, foreign, unique)
- Dimensions: Qualitative attributes for grouping and filtering
- Measures: Quantitative values that can be aggregated
What are Metrics?
Metrics are defined calculations based on semantic models:
| Type | Description | Example |
|---|---|---|
| Simple | Direct aggregations of measures | sum(amount) |
| Derived | Calculations based on other metrics | total_revenue / order_count |
| Cumulative | Running totals over time | sum(amount) over (order by date) |
| Conversion | Funnel analysis and conversion rates | purchases / views |
What are Exposures?
Exposures define how data is consumed downstream:
- Dashboards: BI tool dashboards (Looker, Tableau, PowerBI)
- Applications: Data applications and APIs
- Exports: Data exports to external systems
What are the Benefits of the Semantic Layer?
- Single source of truth: Consistent metric definitions
- Governance: Centralized business logic management
- Reusability: Metrics defined once, used everywhere
- Lineage: Track metric dependencies and usage
- Performance: Pre-computed aggregations
Key Takeaway: The semantic layer standardizes metrics and dimensions, enabling consistent business logic across all downstream applications.
Code Examples
Semantic Model Definition
# semantic/order_semantic.yml
version: 2
semantic_models:
- name: orders_semantic
description: "Semantic model for order analytics"
defaults:
agg_time_dimension: order_date
model: ref('fct_orders')
entities:
- name: order_id
type: primary
expr: order_id
- name: customer_id
type: foreign
expr: customer_id
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day
- name: order_status
type: categorical
expr: status
- name: customer_segment
type: categorical
expr: segment
measures:
- name: order_total
description: "Total revenue from orders"
agg: sum
expr: amount
- name: order_count
description: "Count of orders"
agg: count
expr: order_id
- name: avg_order_value
description: "Average order value"
agg: average
expr: amount
Metrics Definition
# semantic/metrics.yml
version: 2
metrics:
- name: total_revenue
description: "Total revenue from all orders"
type: simple
type_params:
measure: order_total
filter: |
{{ Dimension('order_status') }} != 'cancelled'
- name: order_count
description: "Total number of orders"
type: simple
type_params:
measure: order_count
- name: avg_order_value
description: "Average order value"
type: derived
type_params:
expr: "total_revenue / order_count"
- name: revenue_growth
description: "Month-over-month revenue growth"
type: derived
type_params:
expr: "(current_month_revenue - prev_month_revenue) / prev_month_revenue"
- name: conversion_rate
description: "Conversion rate from views to purchases"
type: conversion
type_params:
measure: order_count
conversion_measure: view_count
window: 7 days
Exposure Definition
# exposures/order_dashboard.yml
version: 2
exposures:
- name: order_analytics_dashboard
type: dashboard
description: "Main dashboard for order analytics"
url: https://looker.example.com/dashboards/order_analytics
depends_on:
- ref('fct_orders')
- ref('dim_customers')
- ref('dim_products')
owner:
name: Analytics Team
email: analytics@company.com
meta:
refresh_frequency: hourly
data_source: Looker
- name: ml_feature_store
type: ml
description: "ML feature store for customer segmentation"
url: https://ml.example.com/features/customer_segmentation
depends_on:
- ref('customer_features')
- ref('order_features')
owner:
name: Data Science Team
email: ds@company.com
- name: data_export_daily
type: application
description: "Daily data export to external system"
url: https://api.example.com/exports/daily
depends_on:
- ref('fct_daily_metrics')
owner:
name: Data Engineering
email: data-eng@company.com
Semantic Model with Joins
# semantic/order_customer_semantic.yml
version: 2
semantic_models:
- name: order_with_customer
description: "Orders with customer dimensions"
defaults:
agg_time_dimension: order_date
model: ref('fct_orders')
joins:
- name: customers
join_type: left
sql: "ON {{ entity('order_id') }} = {{ entity('customer_id') }}"
entities:
- name: order_id
type: primary
expr: order_id
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day
- name: customer_name
type: categorical
expr: "customers.customer_name"
- name: customer_segment
type: categorical
expr: "customers.segment"
measures:
- name: total_revenue
agg: sum
expr: amount
Querying the Semantic Layer
# Query definition for Looker
query:
metrics:
- total_revenue
- order_count
- avg_order_value
group_by:
- order_date__day
- customer_segment
where: |
{{ Dimension('order_date') }} >= '2024-01-01'
AND {{ Dimension('order_status') }} = 'completed'
order_by:
- order_date__day
limit: 1000
Performance Metrics
| Component | Description | Performance Impact |
|---|---|---|
| Semantic Model | Query compilation | Low |
| Metrics | Aggregation | Medium |
| Dimensions | Grouping | Low |
| Exposures | Dependency tracking | Minimal |
| Cache | Query result caching | High (positive) |
Best Practices
- Define semantic models for all analytical entities
- Use consistent naming for metrics and dimensions
- Document all exposures with clear descriptions
- Define relationships between semantic models
- Use appropriate measures for different aggregation types
- Filter at the metric level to ensure consistency
- Monitor exposure usage to understand data consumption
- Version control all semantic definitions
See Also
- dbt Documentation and Lineage β Documentation generation and metadata management
- The ref() Function β Model reference resolution and dependency management
- dbt Core Architecture β Manifest, DAG, and compilation pipeline
- Data Engineering Fundamentals β Modern data stack overview
- PySpark Data Pipelines β Building data pipelines with Spark