πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Snowflake Data Cloud Platform Fundamentals

Module 3: Data Warehouses & StorageCloud Data Platforms🟒 Free Lesson

Advertisement

Snowflake: The Elastic Data Cloud

Snowflake is a cloud-native data platform that separates compute from storage, enabling independent scaling of each resource layer.

Why Snowflake Matters


Traditional Warehouses:

  • Require vertical scaling (bigger machines)
  • Capacity planning is complex
  • Limited elasticity

Snowflake's Innovation:

  • Horizontal elastic scaling through virtual warehouses
  • Pause, resume, and resize in seconds
  • Near-infinite elasticity
  • Eliminates capacity planning

Key Insight: Snowflake fundamentally changes how data teams manage analytical workloads by introducing horizontal elastic scaling through virtual warehouses that can be paused, resumed, and resized in seconds.


Architecture Overview

Snowflake 3-Layer Architecture

Snowflake 3-Layer SeparationCompute LayerVirtual warehousesElastic, independentAuto-suspend/resumeMulti-cluster scalingStorage LayerMicro-partitions (50-500MB)Columnar compressedAutomatic clusteringTime travel enabledServices LayerQuery parsing/planningMetadata managementSecurity + encryptionOptimization + cachingMulti-Cluster Auto-ScalingCluster 1 (min)Cluster 2 (scale)Cluster 3 (scale)Cluster 4 (max)Scale DownAuto-suspend: 60s idle | Auto-resume on query | Max clusters: 10

Compute-Storage Separation

Compute-Storage Separation

Snowflake decouples query processing (virtual warehouses) from persistent storage (micro-partitions). Each layer scales independently: warehouses are ephemeral compute clusters, while storage persists encrypted data in cloud object storage.

Snowflake Resource Model

  • Storage Cost = Data_Volume_GB Γ— 23/TB/month(standard)or23/TB/month (standard) or40/TB/month (Snowpipe)
  • Compute Cost = Ξ£ (warehouse_size_credits Γ— hours_used Γ— credit_price)
  • Credit Price = $2-4/credit depending on edition (Standard, Enterprise, Business Critical)
  • Effective $/TB queried = Compute_Cost / (Data_Volume_TB Γ— Queries_per_Hour)
  • Storage-to-Compute Ratio = Monthly_Storage_Cost / Monthly_Compute_Cost (typical: 0.1-0.3)
-- Create a virtual warehouse with auto-suspend
CREATE WAREHOUSE analytics_wh
    WAREHOUSE_SIZE = 'medium'
    AUTO_SUSPEND = 300          -- Suspend after 5 minutes of inactivity
    AUTO_RESUME = TRUE          -- Resume automatically on query
    MIN_CLUSTER_COUNT = 1
    MAX_CLUSTER_COUNT = 4       -- Multi-cluster scaling
    SCALING_POLICY = 'ECONOMY'  -- Cost-optimized scaling
    INITIALLY_SUSPENDED = FALSE;

-- Create warehouse for ad-hoc queries (smaller, faster)
CREATE WAREHOUSE adhoc_wh
    WAREHOUSE_SIZE = 'x-small'
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    MIN_CLUSTER_COUNT = 1
    MAX_CLUSTER_COUNT = 1;

-- Monitor warehouse utilization
SELECT
    warehouse_name,
    warehouse_size,
    SUM(credits_used)           AS total_credits,
    SUM(query_count)            AS total_queries,
    ROUND(SUM(credits_used) / NULLIF(SUM(query_count), 0), 4) AS credits_per_query,
    AVG(avg_queue_time_ms)      AS avg_queue_ms,
    AVG(avg_execution_time_ms)  AS avg_exec_ms
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_date >= DATEADD(day, -30, CURRENT_DATE())
GROUP BY warehouse_name, warehouse_size
ORDER BY total_credits DESC;

Micro-Partitions and Pruning

Snowflake stores data in micro-partitions: compressed, columnar chunks of 50-500 MB uncompressed. Each micro-partition stores metadata including min/max values per column, enabling partition pruning β€” skipping irrelevant micro-partitions entirely during query execution.

Partition Pruning Efficiency

  • Let M = total micro-partitions, m = micro-partitions accessed after pruning
  • Pruning Ratio = (M - m) / M Γ— 100%
  • I/O Reduction = Pruning_Ratio Γ— Data_Per_MP_GB
  • Query Speedup β‰ˆ 1 / (1 - Pruning_Ratio) assuming I/O-bound
  • Example: 1000 MPs, prune to 50 -> 95% pruning -> 20x speedup
-- Query pruning metadata
SELECT
    partition_id,
    row_count,
    row_start_byte,
    row_end_byte,
    column_name,
    min_value,
    max_value
FROM TABLE(GET_METADATA_FOR_POSITIONAL_BLOCK('fact_sales'))
LIMIT 50;

-- Use clustering key for better pruning
CREATE TABLE fact_sales (
    sale_key        BIGINT,
    customer_key    INT,
    product_key     INT,
    sale_date       DATE,
    net_amount      DECIMAL(14,2)
)
CLUSTER BY (sale_date);  -- Micro-partitions sorted by sale_date

-- Check clustering depth and overlap
SELECT
    system$clustering_depth('fact_sales', '(sale_date)'),
    system$clustering_ratio('fact_sales', '(sale_date)'),
    system$clustering_information('fact_sales', '(sale_date)');

Time Travel and Zero-Copy Cloning

Time Travel enables querying historical data at any point within a retention period (1-90 days on Enterprise). Snowflake maintains a metadata layer that tracks changes, allowing point-in-time queries and data recovery without restoring backups.

-- Query data as it was 2 hours ago
SELECT * FROM fact_sales AT (OFFSET => -7200);

-- Query data at a specific timestamp
SELECT * FROM fact_sales AT (TIMESTAMP => '2025-01-15 10:30:00'::TIMESTAMP_TZ);

-- Query data before a specific statement
SELECT * FROM fact_sales BEFORE (STATEMENT => 'abc123');

-- Zero-Copy Clone (instant, metadata-only)
CREATE TABLE fact_sales_staging CLONE fact_sales;

-- Clone with data freshness control
CREATE TABLE fact_sales_jan CLONE fact_sales
    AT (TIMESTAMP => '2025-01-31 23:59:59'::TIMESTAMP_TZ);

-- Drop and restore using Time Travel
DROP TABLE fact_sales;
UNDROP TABLE fact_sales;

-- Cross-region clone
CREATE TABLE fact_sales_dr CLONE snowflake_db.fact_sales
    AT (TIMESTAMP => '2025-01-15 00:00:00'::TIMESTAMP_TZ);

Data Sharing and Snowflake Marketplace

Snowflake Secure Data Sharing enables zero-copy, real-time data sharing between Snowflake accounts without data movement. Data remains in the provider's account while consumers query it directly via their virtual warehouse.

-- Create a share
CREATE SHARE company_analytics_share;
GRANT USAGE ON DATABASE analytics_db TO SHARE company_analytics_share;
GRANT USAGE ON SCHEMA analytics_db.public TO SHARE company_analytics_share;
GRANT SELECT ON TABLE analytics_db.public.fact_daily_metrics TO SHARE company_analytics_share;

-- Add consumer account
ALTER SHARE company_analytics_share ADD ACCOUNTS = consumer_account_id;

-- Create reader account for external sharing
CREATE ACCOUNT reader_acct
    ADMIN_NAME = 'admin'
    ADMIN_PASSWORD = 'secure_password'
    EMAIL = 'reader@company.com'
    TYPE = READER;

-- List shares available to this account
SHOW SHARES;
SELECT * FROM INFORMATION_SCHEMA.SHARES;

Multi-Cluster Auto-Scaling

Multi-Cluster Scaling

  • Cluster Count: C ∈ [C_min, C_max]
  • Concurrency Level: T = C Γ— T_per_cluster (typically 8-16 concurrent queries per cluster)
  • Scaling Trigger: Queue depth > T_per_cluster for > 60 seconds -> add cluster
  • Scale-Down Trigger: Utilization < 50% for > 300 seconds -> remove cluster
  • Elasticity Score = Actual_Concurrency / Max_Possible_Concurrency
-- Monitor multi-cluster scaling events
SELECT
    warehouse_name,
    start_time,
    end_time,
    cluster_number,
    credits_used,
    queries_executed
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_LOAD_HISTORY
WHERE warehouse_name = 'analytics_wh'
  AND start_date >= DATEADD(day, -7, CURRENT_DATE())
ORDER BY start_time DESC;

-- Query to check if scaling is needed
SELECT
    warehouse_name,
    AVG(available) AS avg_available,
    AVG(queued)    AS avg_queued,
    CASE
        WHEN AVG(queued) > 8 THEN 'SCALE UP NEEDED'
        WHEN AVG(available) > 12 THEN 'SCALE DOWN OPPORTUNITY'
        ELSE 'ADEQUATE'
    END AS scaling_recommendation
FROM TABLE(INFORMATION_SCHEMA.WAREHOUSE_LOAD_HISTORY(
    TIME_RANGE_START => DATEADD(day, -7, CURRENT_DATE()),
    TIME_RANGE_END => CURRENT_DATE()
))
GROUP BY warehouse_name;

Snowflake Editions and Pricing

Snowflake offers multiple editions with different feature sets and pricing tiers. Standard provides basic functionality, Enterprise adds Time Travel and materialized views, Business Critical adds HIPAA compliance and AWS PrivateLink, and Virtual Private Snowflake provides dedicated infrastructure.

Snowflake Pricing Model

  • Standard Edition: $2/credit, 1-day Time Travel, basic security
  • Enterprise Edition: $3/credit, 90-day Time Travel, materialized views, search optimization
  • Business Critical: $4/credit, HIPAA, PCI DSS, AWS PrivateLink, Tri-Secret Secure
  • Virtual Private Snowflake: Custom pricing, dedicated infrastructure
  • Storage: 23/TB/month(standard),23/TB/month (standard),40/TB/month (Snowpipe Continuous Ingestion)
  • Cost Optimization: Annual commitment -> 10-30% discount vs pay-as-you-go
-- Monitor credit consumption by warehouse
SELECT
    warehouse_name,
    SUM(credits_used) AS total_credits,
    SUM(credits_used) * 3 AS estimated_cost_usd,
    SUM(query_count) AS total_queries,
    ROUND(SUM(credits_used) * 3.0 / NULLIF(SUM(query_count), 0), 4) AS cost_per_query
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_date >= DATEADD(day, -30, CURRENT_DATE())
GROUP BY warehouse_name
ORDER BY total_credits DESC;

-- Monitor storage consumption
SELECT
    table_schema,
    SUM(active_bytes) / 1024 / 1024 / 1024 AS active_storage_gb,
    SUM(time_travel_bytes) / 1024 / 1024 / 1024 AS time_travel_gb,
    SUM(fail_safe_bytes) / 1024 / 1024 / 1024 AS fail_safe_gb
FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS
WHERE deleted = 0
GROUP BY table_schema
ORDER BY active_storage_gb DESC;

-- Cost optimization: Identify expensive queries
SELECT
    query_id,
    query_text,
    user_name,
    warehouse_name,
    credits_used,
    ROUND(credits_used * 3, 4) AS cost_usd,
    bytes_scanned / 1024 / 1024 / 1024 AS gb_scanned,
    execution_time_ms
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -7, CURRENT_DATE())
  AND credits_used > 1
ORDER BY credits_used DESC
LIMIT 20;

Key Concepts Summary

ConceptDescriptionBenefitDefault Setting
Virtual WarehouseEphemeral compute clusterElastic scalingX-Small, auto-suspend 300s
Micro-Partition50-500MB compressed columnar chunkAutomatic pruning50-500 MB uncompressed
Time TravelQuery historical dataRecovery, audit1 day (Standard), 90 days (Enterprise)
Zero-Copy CloneInstant metadata-only copyDev/test, branchingNo additional storage
Multi-ClusterAuto-scaling compute clustersHigh concurrency1-10 clusters
Data SharingZero-copy cross-account sharingCollaborationNo data movement
Result CachingAutomatic query result cacheInstant repeat queriesEnabled by default
Automatic ClusteringSelf-managing cluster maintenanceOptimal pruningManual or auto
SnowpipeContinuous file ingestionNear-real-time loadingAuto-ingest enabled
StreamsChange data captureIncremental processingAppend-only by default
TasksScheduled SQL executionETL orchestrationServerless option
Materialized ViewsPre-computed query resultsDashboard performanceManual refresh

Performance Metrics

Warehouse SizeCredits/HourQueries/Hour (est.)Max Data ScannedCost/Query (est.)
X-Small1100-500100 GB$0.004-0.02
Small2200-1,000500 GB$0.004-0.01
Medium4400-2,0001 TB$0.004-0.01
Large8600-3,0002 TB$0.005-0.013
X-Large16800-4,0004 TB$0.008-0.02
2X-Large321,000-5,0008 TB$0.013-0.032
3X-Large641,200-6,00016 TB$0.021-0.053
4X-Large1281,500-8,00032 TB$0.043-0.085

10 Best Practices

  1. Right-size virtual warehouses β€” start small, monitor queue times, scale only when needed
  2. Use AUTO_SUSPEND aggressively β€” set 60-300 seconds to avoid paying for idle compute
  3. Implement clustering keys on frequently filtered columns to maximize micro-partition pruning
  4. Leverage result caching β€” identical queries within 24 hours return cached results instantly
  5. Use Zero-Copy Clones for dev/test instead of full data copies
  6. Enable Time Travel retention at the Enterprise level for compliance and debugging
  7. Monitor warehouse utilization via ACCOUNT_USAGE views and adjust sizes weekly
  8. Use Snowpipe for continuous ingestion rather than bulk COPY for near-real-time data
  9. Separate warehouses by workload β€” ETL, ad-hoc, and BI should use dedicated warehouses
  10. Set up alerts for queue depth > 10 and credit consumption anomalies

  • Compute-storage separation enables independent scaling and cost optimization
  • Micro-partitions with automatic pruning reduce I/O by 80-95% for filtered queries
  • Time Travel provides point-in-time recovery without backup restoration overhead
  • Multi-cluster warehouses handle concurrency spikes automatically
  • Proper warehouse sizing and auto-suspend are the primary cost levers

See Also

⭐

Premium Content

Snowflake Data Cloud Platform Fundamentals

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Data Engineering Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement