Snowflake Caching and Performance
Snowflake employs a multi-layered caching architecture that dramatically accelerates query performance by storing intermediate and final results at various levels.
Architecture Diagram 2: Multi-Layer Cache Lookup Flow
Result Cache
The result cache stores the final results of executed queries for reuse:
-- Enable result cache (default: ON)
ALTER SESSION SET USE_CACHED_RESULT = TRUE;
-- Check if query uses cache
SELECT /*+ NO_USE_CACHED_RESULT */
order_date,
COUNT(*) as order_count
FROM orders
GROUP BY order_date;
-- Monitor cache usage
SELECT
query_id,
query_text,
result_cache_hit,
execution_time_ms
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
WHERE query_text LIKE '%GROUP BY%';
Cache Invalidation Rules
| Condition | Cache Behavior |
|---|---|
| Same query text | Cache hit |
| Same session | Cache hit |
| Same warehouse | Cache hit |
| Data modification | Cache invalidated |
| Time travel query | Different cache |
| Different warehouse | Cache miss |
Micro-Partition Cache
Micro-partitions are automatically cached after first access:
-- Check micro-partition cache statistics
SELECT
table_name,
partition_count,
partitions_pruned,
partitions_scanned
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
WHERE query_text LIKE '%orders%'
LIMIT 5;
-- Monitor cache hit ratio
SELECT
table_name,
total_micro_partitions,
cached_micro_partitions,
cache_hit_ratio
FROM TABLE(INFORMATION_SCHEMA.MICROPARTITION_CACHE_METRICS)
WHERE table_name = 'ORDERS';
Micro-Partition Pruning
-- Enable partition pruning (default: ON)
ALTER SESSION SET ENABLE_PRUNING_OPTIMIZATION = TRUE;
-- Check pruning effectiveness
EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-15';
-- Monitor pruning statistics
SELECT
query_id,
partitions_total,
partitions_scanned,
pruning_ratio
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
WHERE query_text LIKE '%WHERE order_date%';
Remote Disk Cache
Snowflake caches data on remote disk for persistent performance:
-- Check remote cache status
SELECT
table_name,
remote_disk_cache_bytes / 1024 / 1024 AS cache_size_mb,
cache_hit_ratio
FROM INFORMATION_SCHEMA.REMOTE_DISK_CACHE_METRICS
WHERE table_name = 'ORDERS';
-- Force cache refresh
ALTER TABLE orders RECLUSTER;
Performance Optimization Strategies
Warehouse Sizing
-- Check warehouse performance
SELECT
warehouse_name,
avg_query_time_ms,
total_queries,
cache_hit_ratio
FROM TABLE(INFORMATION_SCHEMA.WAREHOUSE_METERING_HISTORY(
START_TIME => DATEADD('day', -7, CURRENT_TIMESTAMP())
))
WHERE warehouse_name = 'COMPUTE_WH';
-- Scale warehouse for better caching
ALTER WAREHOUSE compute_wh SET WAREHOUSE_SIZE = 'LARGE';
Query Optimization
-- Use materialized views for complex aggregations
CREATE MATERIALIZED VIEW mv_daily_summary AS
SELECT
order_date,
COUNT(*) as order_count,
SUM(amount) as total_revenue
FROM orders
GROUP BY order_date;
-- Enable result caching for specific queries
SELECT /*+ USE_CACHED_RESULT */
order_date,
COUNT(*) as order_count,
SUM(amount) as total_revenue
FROM orders
GROUP BY order_date;
Partition Optimization
-- Cluster tables for better pruning
ALTER TABLE orders CLUSTER BY (order_date, region);
-- Check clustering depth
SELECT
table_name,
clustering_depth,
clustering_information
FROM INFORMATION_SCHEMA.TABLE_STORAGE_METRICS
WHERE table_name = 'ORDERS';
-- Manual re-clustering
ALTER TABLE orders RECLUSTER;
Cache Monitoring
-- Comprehensive cache metrics
SELECT
'Result Cache' as cache_type,
SUM(result_cache_hit) as hits,
COUNT(*) - SUM(result_cache_hit) as misses,
ROUND(SUM(result_cache_hit) / COUNT(*) * 100, 2) as hit_ratio
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('day', -1, CURRENT_TIMESTAMP())
))
UNION ALL
SELECT
'Micro-Partition Cache' as cache_type,
SUM(partitions_cached) as hits,
SUM(partitions_scanned) as misses,
ROUND(SUM(partitions_cached) / SUM(partitions_scanned) * 100, 2) as hit_ratio
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('day', -1, CURRENT_TIMESTAMP())
));
For optimal caching performance, keep query text consistent (avoid dynamic SQL), ensure sufficient warehouse size for parallel processing, and use clustering to improve partition pruning.
Performance Best Practices
| Strategy | Implementation | Expected Improvement |
|---|---|---|
| Result Caching | USE_CACHED_RESULT hint | 10-100x for repeated queries |
| Partition Pruning | Filter on clustered columns | 5-50x for filtered queries |
| Materialized Views | Pre-aggregate common queries | 2-20x for aggregations |
| Warehouse Sizing | Match workload to size | 10-50% improvement |
| Query Optimization | Simplify SQL, reduce JOINs | 2-10x for complex queries |
-- Monitor overall performance
SELECT
query_id,
query_text,
execution_time_ms,
bytes_scanned,
result_cache_hit,
ROUND(bytes_scanned / 1024 / 1024, 2) AS scanned_mb
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
ORDER BY execution_time_ms DESC
LIMIT 10;
Summary
Key Takeaways
Result cache provides instant results for identical queries β zero execution time.
Micro-partition cache improves performance through intelligent data pruning.
Remote disk cache provides persistent performance benefits across sessions.
Cache hit ratio is a key performance metric β aim for high ratios.
Performance Optimization Checklist
- Keep query text consistent to maximize result cache hits
- Ensure sufficient warehouse size for parallel processing
- Use clustering to improve partition pruning effectiveness
- Monitor cache hit ratios across all cache layers
- Right-size warehouses based on workload patterns
- Use materialized views for complex, repeated aggregations