Snowflake Data Lakehouse Architecture

The Data Lakehouse architecture represents a paradigm shift in data platform design, unifying the best aspects of data lakes and data warehouses into a single, cohesive platform.

Architecture Diagram 2: Data Lakehouse Unified Architecture

What is a Data Lakehouse?

A Data Lakehouse combines the low-cost, flexible storage of data lakes with the performance, reliability, and ACID transactions of data warehouses.

Key Characteristics

Feature	Data Lake	Data Warehouse	Data Lakehouse
Storage Cost	Low	High	Low
Schema Enforcement	None	Strict	Flexible
ACID Transactions	No	Yes	Yes
Data Formats	Proprietary	Proprietary	Open (Parquet, ORC)
ML Support	Excellent	Limited	Excellent
BI Support	Limited	Excellent	Excellent

What is a Data Lakehouse?

Combines data lake flexibility with data warehouse reliability
Open formats (Parquet, ORC) enable multi-engine access
ACID transactions ensure data integrity
Cost-effective storage with warehouse-grade performance

Snowflake's Lakehouse Implementation

External Tables

Query data stored externally without copying
Supports Parquet, JSON, ORC formats
Auto-refresh for up-to-date queries

Iceberg Tables

Open table format with ACID transactions
Schema evolution without table recreation
Time travel on Iceberg snapshots

Hybrid Tables

OLTP + OLAP on the same table
Primary key enforcement
ACID transactions for single-row operations

Medallion Architecture

Layer	Purpose	Data State
Bronze	Raw data ingestion	Raw, unvalidated
Silver	Cleaned and validated	Standardized, deduplicated
Gold	Business-ready aggregates	Aggregated, enriched

Progressive data refinement — each layer adds quality and logic
ACID transactions ensure integrity across layers

Cost Optimization

Strategy	Implementation	Benefit
Auto-scaling	Multi-cluster warehouses	Pay only for active compute
Storage tiering	Time-based partitioning	Move old data to cheaper storage
Query optimization	Partition pruning	Reduce data scanned
Materialized views	Pre-aggregate common queries	Faster response times

Key Takeaway: The Lakehouse pattern is ideal for organizations needing both lake flexibility and warehouse reliability. Snowflake's compute-storage separation makes this seamless.

Snowflake's Lakehouse Implementation

Snowflake implements the Lakehouse pattern through several key features:

External Tables

External tables allow Snowflake to query data stored externally without copying it:

-- Create external table from S3
CREATE OR REPLACE EXTERNAL TABLE my_external_table
  WITH LOCATION = @my_s3_stage
  FILE_FORMAT = (TYPE = PARQUET)
  AUTO_REFRESH = true;

-- Query external data directly
SELECT * FROM my_external_table
WHERE date >= '2024-01-01';

Iceberg Tables (Snowflake Native Iceberg)

Snowflake supports Apache Iceberg for open table formats:

-- Create Iceberg table
CREATE OR REPLACE EXTERNAL TABLE my_iceberg_table
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'my_volume'
  TABLE_FORMAT = ICEBERG
  DATA_FILES_FORMAT = (TYPE = PARQUET);

-- Time travel on Iceberg tables
SELECT * FROM my_iceberg_table
  AT (OFFSET => -3600);  -- 1 hour ago

Hybrid Tables

Hybrid tables combine transactional and analytical workloads:

-- Create hybrid table for OLTP-like operations
CREATE OR REPLACE HYBRID TABLE orders (
  order_id INTEGER PRIMARY KEY,
  customer_id INTEGER,
  order_date TIMESTAMP_NTZ,
  total_amount DECIMAL(10,2),
  status VARCHAR(20)
);

-- ACID transactions on hybrid tables
BEGIN TRANSACTION;
  INSERT INTO orders VALUES (1, 100, CURRENT_TIMESTAMP(), 99.99, 'PENDING');
  UPDATE orders SET status = 'COMPLETED' WHERE order_id = 1;
COMMIT;

Data Lakehouse Patterns in Snowflake

Medallion Architecture

The Medallion (Multi-Hop) architecture organizes data into bronze, silver, and gold layers:

-- Bronze Layer: Raw data ingestion
CREATE OR REPLACE TABLE bronze.raw_orders (
  raw_data VARIANT
);

-- Silver Layer: Cleaned and validated data
CREATE OR REPLACE TABLE silver.orders AS
  SELECT
    raw_data:order_id::INTEGER AS order_id,
    raw_data:customer_id::INTEGER AS customer_id,
    raw_data:order_date::DATE AS order_date,
    raw_data:amount::DECIMAL(10,2) AS amount,
    CURRENT_TIMESTAMP() AS ingested_at
  FROM bronze.raw_orders;

-- Gold Layer: Business-ready aggregates
CREATE OR REPLACE TABLE gold.daily_sales_summary AS
  SELECT
    order_date,
    COUNT(*) AS total_orders,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value
  FROM silver.orders
  GROUP BY order_date;

Data Sharing Across Domains

-- Share data across business domains
CREATE SHARE sales_share;
GRANT USAGE ON DATABASE analytics_db TO SHARE sales_share;
GRANT USAGE ON SCHEMA gold TO SHARE sales_share;
GRANT SELECT ON gold.daily_sales_summary TO SHARE sales_share;

-- Consumers can query shared data
CREATE DATABASE sales_from_share FROM SHARE sales_share;
SELECT * FROM sales_from_share.gold.daily_sales_summary;

Performance Optimization for Lakehouse

Partitioning Strategies

-- Partition external data by date
CREATE OR REPLACE EXTERNAL TABLE daily_events
  PARTITION BY (event_date)
  WITH LOCATION = @events_stage
  FILE_FORMAT = (TYPE = PARQUET);

-- Query optimization through partition pruning
SELECT * FROM daily_events
  WHERE event_date = '2024-01-15';

Clustering for Large Tables

-- Cluster large fact tables
CREATE OR REPLACE TABLE large_fact_table CLUSTER BY (event_date, region)
  AS SELECT * FROM staging_events;

-- Manual re-clustering
ALTER TABLE large_fact_table RECLUSTER;

The Lakehouse pattern is ideal for organizations that need both the flexibility of a data lake and the reliability of a data warehouse. Snowflake's architecture makes this particularly seamless because the compute and storage layers are independent.

Cost Optimization in Lakehouse

Strategy	Implementation	Benefit
Auto-scaling	Multi-cluster warehouses	Pay only for active compute
Storage tiering	Time-based partitioning	Move old data to cheaper storage
Query optimization	Partition pruning	Reduce data scanned
Materialized views	Pre-aggregate common queries	Faster response times

Key Takeaways:

Data Lakehouse combines lake flexibility with warehouse reliability
Snowflake supports Iceberg, Hybrid Tables, and External Tables
Medallion Architecture (Bronze/Silver/Gold) organizes data progressively
ACID transactions ensure data integrity in the lakehouse
Cost optimization through partitioning, clustering, and auto-scaling

Snowflake Data Lakehouse Architecture

Snowflake Data Lakehouse Architecture

Architecture Diagram 2: Data Lakehouse Unified Architecture

What is a Data Lakehouse?

Key Characteristics

What is a Data Lakehouse?

Snowflake's Lakehouse Implementation

Medallion Architecture

Cost Optimization

Snowflake's Lakehouse Implementation

External Tables

Iceberg Tables (Snowflake Native Iceberg)

Hybrid Tables

Data Lakehouse Patterns in Snowflake

Medallion Architecture

Data Sharing Across Domains

Performance Optimization for Lakehouse

Partitioning Strategies

Clustering for Large Tables

Cost Optimization in Lakehouse

Premium Content

Need Expert Snowflake Help?