Virtual Warehouse Management: Multi-Cluster, Auto-Suspend & Scaling
Architecture Diagram 1: Warehouse Hierarchy & Sizing
Architecture Diagram 2: Multi-Cluster Scaling Behavior
Architecture Diagram 3: Warehouse Hierarchy
Architecture Diagram 4: Multi-Cluster Auto-Scaling Timeline
Architecture Diagram 5: Scaling Policies Comparison
A multi-cluster warehouse automatically manages multiple independent compute clusters under a logical warehouse. When the primary cluster becomes overloaded (queue depth exceeds threshold), Snowflake starts additional clusters. When load decreases, clusters are suspended to reduce costs.
Scaling policies control how aggressively warehouses add clusters. STANDARD scales quickly (queue depth β₯ 1 triggers new cluster), prioritizing performance. ECONOMY scales conservatively (queue depth β₯ 6 triggers), prioritizing cost savings with some queueing tolerance.
Warehouse Memory Model
- memory_per_node = 2 GB Γ 2^size_index
- memory_per_vwh = memory_per_node Γ nodes_per_cluster
- Example: XL warehouse = 32 GB/node Γ 4 nodes = 128 GB total
- Local SSD cache = memory_per_node Γ 1 (1:1 ratio)
- Spillage occurs when working set exceeds memory β spills to local SSD β remote storage
Theorem: Auto-suspend after T seconds of inactivity guarantees zero credit consumption for idle periods, with resume latency bounded by cloud provider cold-start time (typically 30β60 seconds). Proof: Warehouse suspends all compute nodes when idle time β₯ T. No queries execute during suspension. Auto-resume re-initializes nodes on next query submission. The 60-second minimum billing granularity ensures no partial-second charges.
Set AUTO_SUSPEND=60 for development warehouses (frequent idle periods), AUTO_SUSPEND=300 for production (balance between responsiveness and cost). Use SCALING_POLICY=ECONOMY for batch ETL where some queueing is acceptable.
- Monitor queue depth: Target < 5 queries waiting; > 10 indicates under-provisioning
- Check CPU utilization: 60β80% is optimal; > 95% indicates need to scale up
- Review auto-scaling events: Frequent ADD_CLUSTER suggests under-provisioned MAX_CLUSTER_COUNT
- Analyze credit consumption: Compare credits_used vs. queries_completed for efficiency
- Right-size: Scale down if utilization consistently < 40%; scale up if consistently > 90%
- Test: Run representative workload at new size and verify queue depth remains < 5
- Multi-cluster warehouses auto-scale 1β10 clusters based on queue depth
- Credit billing: Per-second (60s minimum) based on warehouse size Γ time
- STANDARD policy: Fast scaling for interactive workloads
- ECONOMY policy: Conservative scaling for batch ETL (cost-optimized)
- Auto-suspend: 60β86400 seconds; balance responsiveness vs. cost
Detailed Explanation
What are Virtual Warehouses?
Virtual warehouses are independent compute clusters that execute SQL queries and DML operations. Unlike traditional fixed-size appliances, they are elastic and scale dynamically based on workload demands.
Virtual Warehouse Architecture
- Each warehouse: independent nodes with CPU, RAM, and local SSD cache
- Complete isolation β multiple warehouses access same data without contention
- Cloud services layer manages query routing and metadata independently
Multi-Cluster Warehouse Operations
- Automatically manages multiple independent clusters under one logical warehouse
- Scaling triggers: queue depth exceeding threshold
- Scaling behavior:
STANDARDpolicy β scales aggressively (queue β₯ 1)ECONOMYpolicy β scales conservatively (queue β₯ 6)
- Queries routed to cluster with lowest queue depth
- Suspended clusters complete running queries before shutdown
Auto-Suspend and Auto-Resume
| Feature | Configuration | Behavior |
|---|---|---|
| Auto-Suspend | 60β86,400 seconds | Suspends after inactivity |
| Auto-Resume | TRUE/FALSE | Resumes on next query (30β60s typical) |
- Common defaults: 60s for dev, 300s for production
- Zero credit consumption during suspension
Resource Monitoring and Cost Control
- Warehouse-level monitors β individual workload cost attribution
- Account-level monitors β aggregate consumption visibility
- Triggers: NOTIFY, SUSPEND, SUSPEND_IMMEDIATELY at threshold percentages
- Email notifications and external monitoring integration
Performance Optimization Strategies
- Right-size β scale up for complex analytics, down for simple queries
- Scaling policies β STANDARD for interactive, ECONOMY for batch ETL
- Statement timeouts β
STATEMENT_TIMEOUT(max execution) +STATEMENT_QUEUED_TIMEOUT(max queue wait) - Leverage result cache β avoid non-deterministic functions for cache hits
Key Takeaway: Use SCALING_POLICY=ECONOMY for batch ETL where queueing is acceptable, STANDARD for interactive workloads requiring fast response.
Key Concepts Table
| Parameter | Range | Default | Description |
|---|---|---|---|
| WAREHOUSE_SIZE | XS-4XL | M | Compute capacity per node |
| MIN_CLUSTER_COUNT | 1-10 | 1 | Minimum active clusters |
| MAX_CLUSTER_COUNT | 1-10 | 1 | Maximum auto-scaled clusters |
| SCALING_POLICY | STANDARD/ECONOMY | STANDARD | Scaling aggressiveness |
| AUTO_SUSPEND | 60-86400 | 600 | Inactivity timeout (seconds) |
| AUTO_RESUME | TRUE/FALSE | TRUE | Auto-resume on query |
| STATEMENT_TIMEOUT | 0-86400 | 0 | Max execution time (seconds) |
| Warehouse Size | vCPU | Memory | Max Concurrent | Cost/Hour |
|---|---|---|---|---|
| X-Small | 1 | 2 GB | 1 | ~$0.00036/s |
| Small | 2 | 4 GB | 2 | ~$0.00072/s |
| Medium | 4 | 8 GB | 4 | ~$0.00144/s |
| Large | 8 | 16 GB | 8 | ~$0.00288/s |
| X-Large | 16 | 32 GB | 16 | ~$0.00576/s |
| 2X-Large | 32 | 64 GB | 32 | ~$0.01152/s |
| 3X-Large | 64 | 128 GB | 64 | ~$0.02304/s |
| 4X-Large | 128 | 256 GB | 128 | ~$0.04608/s |
| Metric | Target | Warning | Critical |
|---|---|---|---|
| Queue Depth | < 5 | 5-10 | > 10 |
| CPU Utilization | 60-80% | 80-95% | > 95% |
| Auto-Suspend Time | 60-300s | 300-600s | > 600s |
| Credit Consumption | < 80% quota | 80-95% | > 95% |
Code Examples
-- Example 1: Create warehouse with comprehensive settings
CREATE WAREHOUSE analytics_wh
WAREHOUSE_SIZE = 'xlarge'
MIN_CLUSTER_COUNT = 2
MAX_CLUSTER_COUNT = 8
SCALING_POLICY = 'STANDARD'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = FALSE
STATEMENT_TIMEOUT_IN_SECONDS = 3600
STATEMENT_QUEUED_TIMEOUT_IN_SECONDS = 600
RESOURCE_MONITOR = 'analytics_monitor'
COMMENT = 'Multi-cluster warehouse for analytics workloads';
-- Example 2: Create resource monitor
CREATE RESOURCE MONITOR account_monitor
WITH
CREDIT_QUOTA = 10000
FREQUENCY = MONTHLY
START_TIMESTAMP = IMMEDIATELY
TRIGGERS
ON 80% DO NOTIFY
ON 90% DO SUSPEND
ON 95% DO SUSPEND_IMMEDIATELY;
-- Example 3: Create warehouse-specific monitor
CREATE RESOURCE MONITOR etl_monitor
WITH
CREDIT_QUOTA = 5000
FREQUENCY = MONTHLY
TRIGGERS
ON 75% DO NOTIFY
ON 90% DO SUSPEND;
-- Assign monitor to warehouse
ALTER WAREHOUSE etl_wh SET RESOURCE_MONITOR = etl_monitor;
-- Example 4: Monitor warehouse performance
SELECT
warehouse_name,
warehouse_size,
cluster_number,
AVG(queries_completed) as avg_queries,
AVG(queued_overload_queries) as avg_queued,
AVG(execution_time_ms) / 1000 as avg_exec_seconds,
SUM(credits_used) as total_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
GROUP BY 1, 2, 3
ORDER BY total_credits DESC;
-- Example 5: Analyze auto-scaling events
SELECT
warehouse_name,
event_name,
event_timestamp,
event_reason,
cluster_number
FROM TABLE(INFORMATION_SCHEMA.WAREHOUSE_LOAD_HISTORY(
START_TIME => DATEADD(hour, -24, CURRENT_TIMESTAMP())
))
WHERE event_name IN ('RESUME', 'SUSPEND', 'ADD_CLUSTER', 'REMOVE_CLUSTER')
ORDER BY event_timestamp DESC;
-- Example 6: Monitor current warehouse status
SELECT
warehouse_name,
state,
num_queries,
num_queued_queries,
time_active,
time_suspended
FROM TABLE(INFORMATION_SCHEMA.WAREHOUSE_LOAD_HISTORY(
START_TIME => DATEADD(hour, -1, CURRENT_TIMESTAMP())
))
ORDER BY time_active DESC;
-- Example 7: Dynamic warehouse scaling
-- Scale up warehouse for batch processing
ALTER WAREHOUSE etl_wh SET WAREHOUSE_SIZE = '2xlarge';
-- Scale out warehouse for peak load
ALTER WAREHOUSE analytics_wh SET MAX_CLUSTER_COUNT = 10;
-- Scale in warehouse during off-hours
ALTER WAREHOUSE analytics_wh SET MAX_CLUSTER_COUNT = 2;
-- Example 8: Warehouse usage analysis
SELECT
warehouse_name,
DATE_TRUNC('day', start_time) as usage_date,
SUM(credits_used) as daily_credits,
SUM(credits_used) * 3 as daily_cost_usd, -- ~$3/credit
AVG(queries_completed) as avg_queries_per_hour
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD(day, -90, CURRENT_TIMESTAMP())
GROUP BY 1, 2
ORDER BY usage_date DESC, daily_credits DESC;
-- Example 9: Optimize warehouse for specific workload
-- Create warehouse optimized for large scans
CREATE WAREHOUSE scan_wh
WAREHOUSE_SIZE = '3xlarge'
AUTO_SUSPEND = 600
STATEMENT_TIMEOUT_IN_SECONDS = 7200
COMMENT = 'Optimized for large table scans';
-- Create warehouse for interactive queries
CREATE WAREHOUSE interactive_wh
WAREHOUSE_SIZE = 'medium'
AUTO_SUSPEND = 60
STATEMENT_TIMEOUT_IN_SECONDS = 300
STATEMENT_QUEUED_TIMEOUT_IN_SECONDS = 60
COMMENT = 'Optimized for interactive queries';
-- Example 10: Monitor query performance by warehouse
SELECT
warehouse_name,
AVG(execution_time_ms) / 1000 as avg_exec_seconds,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY execution_time_ms) / 1000 as median_exec_seconds,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) / 1000 as p95_exec_seconds,
COUNT(*) as total_queries,
SUM(CASE WHEN execution_time_ms > 60000 THEN 1 ELSE 0 END) as slow_queries
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
AND execution_status = 'SUCCESS'
GROUP BY 1
ORDER BY avg_exec_seconds DESC;
Performance Metrics
| Metric | Target | Warning | Critical | Description |
|---|---|---|---|---|
| Query Queue Depth | < 5 | 5-10 | > 10 | Queries waiting to execute |
| Cluster Scaling Time | < 30s | 30-60s | > 60s | Time to start new cluster |
| Auto-Suspend Delay | 60-300s | 300-600s | > 600s | Time before warehouse suspends |
| Auto-Resume Time | < 60s | 60-120s | > 120s | Time to resume suspended warehouse |
| Credit Utilization | 60-80% | 80-95% | > 95% | Active time vs total time |
Best Practices
-
Right-size warehouses: Start with Medium for ad-hoc queries, scale up for batch ETL. Monitor queue depth to identify under-provisioned warehouses.
-
Use multi-cluster warehouses: Set MIN_CLUSTER_COUNT=1 and MAX_CLUSTER_COUNT=3-5 for concurrent workloads. Use SCALING_POLICY=ECONOMY for cost optimization.
-
Implement warehouse isolation: Create separate warehouses for different workloads (ETL, reporting, ad-hoc) to prevent resource contention and enable independent scaling.
-
Configure appropriate timeouts: Set STATEMENT_QUEUED_TIMEOUT for queries that shouldn't run indefinitely and STATEMENT_TIMEOUT to kill long-running queries.
-
Leverage resource monitors: Set up alerts for credit consumption to prevent unexpected costs. Use MAX_CREDIT_QUOTA per warehouse for budget control.
-
Optimize auto-suspend: Set AUTO_SUSPEND=60 for development warehouses, AUTO_SUSPEND=300 for production, and AUTO_SUSPEND=60 for infrequently used warehouses.
-
Monitor scaling events: Track warehouse load history to identify scaling patterns and optimize MIN/MAX_CLUSTER_COUNT settings.
-
Use warehouse groups: Organize warehouses by workload type (analytics, ETL, development) to simplify management and cost attribution.
-
Implement cost allocation: Use resource monitors and tags to attribute warehouse costs to specific departments or projects.
-
Regular performance reviews: Analyze warehouse performance metrics weekly to identify optimization opportunities and right-sizing needs.
See Also
- Snowflake Architecture β Warehouse architecture details
- Optimization Techniques β Query optimization strategies
- Access Control β Warehouse access control
- Stored Procedures β Automated warehouse management
- PySpark Iceberg Tables β Compute optimization patterns
- Data Warehouse Concepts β Data warehouse design principles