πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Snowflake Data Lineage Tracking

🟒 Free Lesson

Advertisement

Snowflake Data Lineage Tracking

Data Lineage in Snowflake tracks the flow of data from source to destination, providing visibility into transformations, dependencies, and impact for governance and compliance.

Data Lineage ArchitectureSourcesRaw DataTransformETL/ELTStagingCuratedServingMarts/APIsTargetUsers/AppsColumn-Level LineageColumn A β†’ Column BImpact AnalysisDownstream effectsQuery TextTransformation logic

What is Data Lineage?

  • Tracks data movement and transformation end-to-end
  • Provides column-level visibility
  • Supports impact analysis and compliance reporting

Architecture Overview

The data lineage architecture tracks data flow through five stages:

  1. Sources β€” Raw Tables, External Data, APIs, Files, Streams
  2. Transform β€” ETL Jobs, Stored Procs, Tasks, Views, UDFs
  3. Staging β€” Curated, Conformed, Enriched, Aggregated, Validated
  4. Serving β€” Data Marts, Analytics, ML Features, Reports, APIs
  5. Target β€” Users, Apps, ML, Exports

Lineage Metadata

  • Object dependencies
  • Column lineage
  • Transformation types
  • Query text and timestamps

Use Cases

  • Impact analysis
  • Root cause analysis
  • Compliance reporting
  • Data discovery

Key Concepts

DfData Lineage

DfImpact Analysis

Accessing Lineage

Object Lineage

-- Get lineage for a specific table
SELECT *
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DATABASE_NAME => 'my_database',
  SCHEMA_NAME => 'analytics',
  TABLE_NAME => 'fact_sales'
));

-- Upstream dependencies
SELECT *
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DATABASE_NAME => 'my_database',
  SCHEMA_NAME => 'analytics',
  TABLE_NAME => 'fact_sales',
  DIRECTION => 'UPSTREAM'
));

-- Downstream dependencies
SELECT *
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DATABASE_NAME => 'my_database',
  SCHEMA_NAME => 'analytics',
  TABLE_NAME => 'fact_sales',
  DIRECTION => 'DOWNSTREAM'
));

Column-Level Lineage

-- Column lineage
SELECT
  source_table,
  source_column,
  target_table,
  target_column,
  transformation_type,
  transformation_expression
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DATABASE_NAME => 'my_database',
  SCHEMA_NAME => 'analytics',
  TABLE_NAME => 'fact_sales',
  COLUMN_NAME => 'revenue'
));

Impact Analysis Queries

Change Impact Assessment

-- Find all objects affected by a table change
WITH RECURSIVE downstream AS (
  -- Base table
  SELECT
    table_name,
    0 as depth,
    table_name as root_table
  FROM information_schema.tables
  WHERE table_name = 'source_table'
  
  UNION ALL
  
  -- Direct dependents
  SELECT
    d.target_table,
    d.depth + 1,
    d.root_table
  FROM downstream d
  JOIN TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
    DIRECTION => 'DOWNSTREAM'
  )) l ON d.table_name = l.source_table
)
SELECT * FROM downstream
ORDER BY depth;

Data Quality Impact

-- Find objects with data quality issues
SELECT
  source_table,
  COUNT(DISTINCT target_table) as affected_objects,
  COUNT(DISTINCT target_column) as affected_columns
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DIRECTION => 'DOWNSTREAM'
))
WHERE source_table IN (
  SELECT table_name
  FROM information_schema.data_quality_metrics
  WHERE quality_score < 0.9
)
GROUP BY 1
ORDER BY affected_objects DESC;

Lineage Visualization

-- Generate lineage graph data
SELECT
  'node' as type,
  table_name as id,
  table_name as label,
  CASE
    WHEN table_type = 'BASE TABLE' THEN 'table'
    WHEN table_type = 'VIEW' THEN 'view'
    ELSE 'other'
  END as shape
FROM information_schema.tables
WHERE database_name = 'my_database'

UNION ALL

SELECT
  'edge' as type,
  source_table as id,
  target_table as label,
  transformation_type as shape
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DATABASE_NAME => 'my_database'
));

Data lineage is automatically captured by Snowflake for all DML operations. For complex pipelines, consider creating dedicated lineage views that aggregate metadata for visualization tools and impact analysis dashboards.

Lineage for Compliance

-- GDPR data flow report
SELECT
  source_table,
  target_table,
  columns_used,
  transformation_type,
  query_text
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DIRECTION => 'DOWNSTREAM'
))
WHERE source_table LIKE '%customer%'
  OR source_table LIKE '%pii%'
  OR source_table LIKE '%personal%'
ORDER BY source_table, target_table;

-- Data retention impact
SELECT
  target_table,
  COUNT(DISTINCT source_tables) as upstream_sources,
  MIN(retention_days) as min_retention,
  MAX(retention_days) as max_retention
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DIRECTION => 'UPSTREAM'
))
GROUP BY 1;
  • Data lineage tracks transformations from source to target
  • Column-level lineage provides granular visibility
  • Impact analysis assesses downstream effects of changes
  • Automatic capture for all DML operations
  • Supports compliance requirements (GDPR, HIPAA, SOX)
⭐

Premium Content

Snowflake Data Lineage Tracking

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Snowflake Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement