Monitoring Queries: Query History, Warehouse Utilization & Cost Analysis
Architecture Diagram 1: Monitoring Architecture
Architecture Diagram 2: Query History Analysis
Architecture Diagram 3: Cost Analysis Dashboard
Architecture Diagram 4: Query Performance Pipeline
Account Usage Views are system-defined views providing metadata on queries, sessions, storage, and compute. They retain historical data for up to 365 days and enable monitoring of query performance, resource consumption, and usage patterns across the account.
Warehouse Metering History tracks credit consumption per warehouse over time. It provides hourly granularity for cost attribution, enabling identification of expensive queries, idle warehouses, and optimization opportunities.
Query Performance Metrics
- Queue time: Time waiting for warehouse resources (target < 5s)
- Execution time: Time spent executing on warehouse (varies by query)
- Total time: Queue + Execution + compilation + result transfer
- Compiling time: SQL -> query plan (typically < 1s for simple queries)
- Performance threshold: > 60s execution = potential optimization candidate
Monitor queue time consistently > 5s β indicates warehouse undersizing. Use QUERY_HISTORY to identify top consumers by credits and execution time. Set up alerts for credit consumption exceeding daily budget thresholds.
- Account Usage: Query_history, warehouse_metering_history, login_history for visibility
- Query profiling: EXPLAIN plans + performance metrics for optimization
- Cost tracking: Credit consumption per warehouse, per query, per user
- Alerting: Warehouse errors, long-running queries, high queue times
- Retention: Account Usage views retain 365 days; Information Schema retains 7 days
Detailed Explanation
What is Query History Analysis?
Comprehensive tracking of every query executed β execution times, resource consumption, errors, and performance metrics.
Key Query Metrics
| Metric | Description | Target |
|---|---|---|
| Execution time | Total submission β completion | < 60s |
| Compilation time | SQL parsing + optimization | < 1s |
| Queue time | Waiting for warehouse resources | < 5s |
| Bytes scanned | Data volume processed | Varies |
| Partition pruning | % micro-partitions eliminated | > 80% |
| Spillage | Memory overflow to disk | 0 bytes |
Warehouse Utilization Monitoring
- CPU/Memory utilization β processor and RAM usage
- Concurrent queries β simultaneous executions
- Queue depth β queries waiting to execute
- Cluster scaling events β when auto-scaling occurred
- Auto-suspend/resume patterns β warehouse activity cycles
Cost Analysis and Optimization
- Compute credits = warehouse size Γ execution time
- Storage costs = per TB per month (+ time travel overhead)
- Cost attribution by warehouse, user, and workload
- Resource monitors prevent unexpected overruns
Performance Dashboards
- Query performance trends (execution times over time)
- Warehouse utilization heatmaps (usage by hour)
- Cost breakdown charts (credits by category)
- Error rate monitors (failure tracking)
Alerting and Proactive Monitoring
- Queue depth > 10 β warehouse overload
- Query duration > 600s β slow query
- Credit consumption > 80% budget β budget warning
- Error rate > 5% β system issue
Key Takeaway: Monitor queue time consistently > 5s β indicates warehouse undersizing. Account Usage retains 365 days; Information Schema retains 7 days.
Key Concepts Table
| Metric | Description | Warning Threshold | Critical Threshold |
|---|---|---|---|
| Query Queue Depth | Queries waiting to execute | > 5 | > 10 |
| Query Duration | Total execution time | > 300s | > 600s |
| Partition Pruning | Data elimination efficiency | < 70% | < 50% |
| Credit Consumption | Daily/hourly usage | > 80% budget | > 95% budget |
| Error Rate | Failed queries percentage | > 1% | > 5% |
| Dashboard Component | Refresh Rate | Use Case |
|---|---|---|
| Real-time Status | 1 minute | Current system health |
| Hourly Trends | 1 hour | Short-term pattern analysis |
| Daily Reports | 1 day | Operational management |
| Weekly Analytics | 1 week | Performance optimization |
| Monthly Summaries | 1 month | Capacity planning |
| Alert Type | Trigger Condition | Response |
|---|---|---|
| Queue Depth | > 10 queries | Scale warehouse |
| Query Timeout | > 600 seconds | Investigate query |
| Credit Limit | > 80% budget | Review usage |
| Error Spike | > 5% error rate | Investigate system |
Code Examples
Query History Analysis
-- Example 1: Query history analysis with comprehensive metrics
-- QUERY_HISTORY columns:
-- query_id: Unique identifier for each query execution
-- execution_status: SUCCESS, FAILED, RUNNING, QUEUED
-- compilation_time_ms: Time to parse and optimize SQL
-- execution_time_ms: Time spent executing on warehouse
-- queue_time_ms: Time waiting for warehouse resources
-- bytes_scanned: Total bytes read from storage
-- partitions_scanned: Number of micro-partitions read
-- partitions_total: Total micro-partitions in table
SELECT
query_id,
query_text,
user_name,
warehouse_name,
execution_status,
start_time,
end_time,
DATEDIFF(second, start_time, end_time) as total_seconds,
compilation_time_ms / 1000 as compile_seconds,
execution_time_ms / 1000 as exec_seconds,
queue_time_ms / 1000 as queue_seconds,
bytes_scanned / 1024 / 1024 as mb_scanned,
partitions_scanned,
partitions_total,
ROUND((partitions_scanned / NULLIF(partitions_total, 0)) * 100, 1) as pruning_pct,
spill_bytes / 1024 / 1024 as mb_spilled
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
AND execution_status = 'SUCCESS'
ORDER BY exec_seconds DESC
LIMIT 100;
-- Example 2: Identify slow queries with optimization recommendations
-- Focus on queries with poor pruning or excessive execution time
SELECT
query_id,
LEFT(query_text, 100) AS query_preview,
user_name,
warehouse_name,
DATEDIFF(second, start_time, end_time) as duration_seconds,
bytes_scanned / 1024 / 1024 as mb_scanned,
compilation_time_ms,
execution_time_ms,
queue_time_ms,
CASE
WHEN partitions_scanned / NULLIF(partitions_total, 0) > 0.8 THEN 'Good pruning'
WHEN partitions_scanned / NULLIF(partitions_total, 0) > 0.5 THEN 'Moderate pruning'
ELSE 'Poor pruning - consider clustering'
END as pruning_assessment,
CASE
WHEN execution_time_ms > 600000 THEN 'Critical - optimize query'
WHEN execution_time_ms > 300000 THEN 'Warning - review execution plan'
ELSE 'Acceptable'
END as performance_status
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -1, CURRENT_TIMESTAMP())
AND execution_status = 'SUCCESS'
AND execution_time_ms > 60000 -- Only queries > 1 minute
ORDER BY execution_time_ms DESC;
Warehouse Utilization and Cost Analysis
-- Example 3: Warehouse utilization analysis
-- Tracks hourly credit consumption and query throughput per warehouse
SELECT
warehouse_name,
warehouse_size,
DATE_TRUNC('hour', start_time) as hour_bucket,
SUM(credits_used) as credits,
SUM(queries_completed) as queries,
SUM(queued_overload_queries) as queued,
AVG(execution_time_ms) / 1000 as avg_exec_sec,
CASE
WHEN SUM(queued_overload_queries) > 0 THEN 'Overloaded'
WHEN SUM(queries_completed) = 0 THEN 'Idle'
ELSE 'Normal'
END as status
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
GROUP BY 1, 2, 3
ORDER BY hour_bucket DESC, credits DESC;
-- Example 4: Cost analysis by warehouse and day
-- Cost = credits_used * $3/credit (Enterprise edition pricing)
SELECT
warehouse_name,
warehouse_size,
DATE_TRUNC('day', start_time) as usage_date,
SUM(credits_used) as daily_credits,
SUM(credits_used) * 3.00 as daily_cost_usd,
SUM(queries_completed) as total_queries,
ROUND(SUM(credits_used) / NULLIF(SUM(queries_completed), 0), 4) as credits_per_query,
CASE
WHEN SUM(credits_used) > 500 THEN 'High usage'
WHEN SUM(credits_used) > 100 THEN 'Moderate usage'
ELSE 'Low usage'
END as usage_category
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
GROUP BY 1, 2, 3
ORDER BY usage_date DESC, daily_credits DESC;
-- Example 5: Cost attribution by user
-- Track which users are consuming the most compute resources
SELECT
user_name,
warehouse_name,
COUNT(*) as query_count,
SUM(execution_time_ms) / 1000 as total_exec_seconds,
SUM(bytes_scanned) / 1024 / 1024 / 1024 as total_gb_scanned,
COUNT(DISTINCT query_id) as unique_queries
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY qh
JOIN SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY wm
ON qh.warehouse_name = wm.warehouse_name
AND DATE_TRUNC('hour', qh.start_time) = DATE_TRUNC('hour', wm.start_time)
WHERE qh.start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
AND qh.user_name != 'SYSTEM'
GROUP BY 1, 2
ORDER BY total_exec_seconds DESC;
Active Monitoring and Alerting
-- Example 6: Monitor currently running queries
SELECT
query_id,
LEFT(query_text, 200) AS query_preview,
user_name,
warehouse_name,
session_id,
DATEDIFF(second, start_time, CURRENT_TIMESTAMP()) as running_seconds,
bytes_scanned / 1024 / 1024 as mb_scanned,
partition_scanned_count,
partition_total_count,
ROUND((partition_scanned_count / NULLIF(partition_total_count, 0)) * 100, 1) as progress_pct
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE execution_status = 'RUNNING'
ORDER BY start_time;
-- Example 7: Resource monitor setup
-- Automatically suspend warehouses when credit limits are reached
CREATE RESOURCE MONITOR monthly_budget_monitor
WITH
CREDIT_QUOTA = 20000 -- Monthly credit limit
FREQUENCY = MONTHLY -- Reset monthly
START_TIMESTAMP = IMMEDIATELY -- Start counting now
TRIGGERS
ON 75% DO NOTIFY -- Alert at 75% usage
ON 90% DO SUSPEND -- Suspend at 90%
ON 95% DO SUSPEND_IMMEDIATELY; -- Emergency suspend at 95%
-- Example 8: Create monitoring alert for slow queries
-- Alert fires every 5 minutes, checks for queries > 5 minutes
CREATE OR REPLACE ALERT slow_query_alert
WAREHOUSE = 'monitoring_wh'
SCHEDULE = 'USING CRON */5 * * * * America/New_York' -- Every 5 minutes
IF (
SELECT COUNT(*)
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(minute, -5, CURRENT_TIMESTAMP())
AND execution_status = 'RUNNING'
AND DATEDIFF(second, start_time, CURRENT_TIMESTAMP()) > 300
) > 0
THEN
CALL SYSTEM$SEND_EMAIL(
'alert_channel',
'admin@company.com',
'Slow Query Alert',
'Queries running longer than 5 minutes detected at ' || CURRENT_TIMESTAMP()
);
-- Example 9: Alert for high queue depth
CREATE OR REPLACE ALERT queue_depth_alert
WAREHOUSE = 'monitoring_wh'
SCHEDULE = 'USING CRON */2 * * * * America/New_York' -- Every 2 minutes
IF (
SELECT COUNT(*)
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE execution_status = 'QUEUED'
AND DATEDIFF(second, start_time, CURRENT_TIMESTAMP()) > 30
) > 10
THEN
CALL SYSTEM$SEND_EMAIL(
'alert_channel',
'admin@company.com',
'High Queue Depth Alert',
'More than 10 queries queued for > 30 seconds'
);
Data Quality and Partition Analysis
-- Example 10: Partition pruning analysis
-- Identify tables with poor clustering that need optimization
SELECT
table_name,
partition_key,
total_partitions,
ROUND(avg_partition_depth, 2) as avg_partition_depth,
ROUND(clustering_depth, 2) as clustering_depth,
CASE
WHEN clustering_depth < 1.5 THEN 'Excellent'
WHEN clustering_depth < 2.5 THEN 'Good'
WHEN clustering_depth < 4.0 THEN 'Needs Improvement'
ELSE 'Poor - consider re-clustering'
END as clustering_quality,
ROUND(active_bytes / 1024 / 1024 / 1024, 2) as active_gb,
ROUND(time_travel_bytes / 1024 / 1024 / 1024, 2) as time_travel_gb
FROM TABLE(INFORMATION_SCHEMA.CLUSTERING_INFORMATION())
WHERE table_name IN ('SALES_DATA', 'CUSTOMER_DATA', 'PRODUCT_DATA')
ORDER BY clustering_depth DESC;
-- Example 11: Table storage analysis
-- Identify large and stale tables for optimization
SELECT
table_name,
database_name,
schema_name,
row_count,
ROUND(bytes / 1024 / 1024 / 1024, 2) as size_gb,
created_on,
last_altered,
DATEDIFF(day, last_altered, CURRENT_TIMESTAMP()) as days_since_modified,
CASE
WHEN DATEDIFF(day, last_altered, CURRENT_TIMESTAMP()) > 90 THEN 'Stale - consider archival'
WHEN DATEDIFF(day, last_altered, CURRENT_TIMESTAMP()) > 30 THEN 'Aging - review usage'
ELSE 'Active'
END as freshness_status
FROM SNOWFLAKE.ACCOUNT_USAGE.TABLES
WHERE deleted_on IS NULL
AND bytes > 1024 * 1024 * 1024 -- Tables > 1 GB
ORDER BY bytes DESC
LIMIT 50;
-- Example 12: User activity analysis
SELECT
user_name,
COUNT(*) as total_queries,
ROUND(AVG(execution_time_ms) / 1000, 2) as avg_exec_sec,
ROUND(SUM(bytes_scanned) / 1024 / 1024 / 1024, 2) as total_gb_scanned,
COUNT(DISTINCT warehouse_name) as warehouses_used,
COUNT(DISTINCT DATE(start_time)) as active_days,
MIN(start_time) as first_query,
MAX(start_time) as last_query
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
AND user_name != 'SYSTEM'
GROUP BY 1
ORDER BY total_queries DESC;
-- Example 13: Login history for security monitoring
SELECT
user_name,
event_timestamp,
event_type, -- LOGIN, LOGOUT
client_ip,
user_agent,
ROUND(datediff('second', event_timestamp, LAG(event_timestamp) OVER (
PARTITION BY user_name ORDER BY event_timestamp
)), 0) as session_duration_sec
FROM SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY
WHERE event_timestamp >= DATEADD(day, -7, CURRENT_TIMESTAMP())
AND event_type = 'LOGIN'
ORDER BY event_timestamp DESC;
Performance Metrics
| Metric | Target | Warning | Critical |
|---|---|---|---|
| Query Queue Depth | < 5 | 5-10 | > 10 |
| Average Query Duration | < 60s | 60-300s | > 300s |
| Partition Pruning Efficiency | > 80% | 60-80% | < 60% |
| Daily Credit Consumption | < 80% budget | 80-95% | > 95% |
| Error Rate | < 1% | 1-5% | > 5% |
Best Practices
-
Implement comprehensive monitoring: Track query performance, warehouse utilization, and cost metrics continuously.
-
Set up proactive alerts: Configure alerts for queue depth, query duration, and credit consumption thresholds.
-
Regular performance reviews: Conduct weekly query performance reviews to identify optimization opportunities.
-
Optimize slow queries: Prioritize optimization of queries with poor partition pruning or excessive execution times.
-
Right-size warehouses: Use utilization metrics to right-size warehouses and eliminate over-provisioning.
-
Monitor cost trends: Track daily and monthly cost trends to identify anomalies and optimize spending.
-
Implement resource monitors: Use resource monitors to prevent unexpected cost overruns.
-
Analyze user activity: Track user query patterns to identify training opportunities and resource allocation needs.
-
Capacity planning: Use historical trends to forecast future resource needs and plan capacity upgrades.
-
Document monitoring procedures: Create runbooks for common monitoring tasks and alert responses.
Additional Theory: Cost Attribution Model
Understanding how Snowflake costs are attributed is essential for budget management:
Compute costs:
- Warehouse credits = (warehouse size credits/hour) Γ (execution time in hours)
- XS=1, S=2, M=4, L=8, XL=16, 2XL=32, 4XL=64 credits/hour
- Multi-cluster warehouses multiply by cluster count
- Auto-suspend saves credits when warehouse is idle
Storage costs:
- Monthly rate per TB of compressed data
- Time travel storage adds to base storage
- Fail-safe storage (7 days for Enterprise) is free
Cost optimization levers:
| Lever | Potential Savings | Effort |
|---|---|---|
| Right-sizing warehouses | 20-50% | Low |
| Query optimization | 10-30% | Medium |
| Auto-suspend tuning | 15-40% | Low |
| Clustering optimization | 10-25% | Medium |
| Resource monitors | Prevents overruns | Low |
Additional Theory: Query Performance Metrics Explained
| Metric | What It Measures | Target | Action if Exceeded |
|---|---|---|---|
| Queue time | Waiting for warehouse resources | < 5s | Scale warehouse up |
| Compilation time | SQL parsing and optimization | < 1s | Simplify query |
| Execution time | Data processing on warehouse | < 60s | Optimize query/warehouse |
| Partition pruning | % of micro-partitions eliminated | > 80% | Add clustering key |
| Spillage | Memory overflow to disk | 0 bytes | Scale warehouse up |
| Bytes scanned | Data volume processed | Varies | Improve filtering |
See Also
- 03-Warehouses-and-Auto-Suspend - Warehouse sizing and auto-suspend
- 09-Streams-Tasks - Monitoring task execution
- 05-Access-Control-RBAC - Monitoring access patterns
- PySpark Iceberg - Spark monitoring patterns
- Delta Lake on Databricks - Delta Lake monitoring
- Data Warehouse Concepts - Data warehouse design principles