Snowflake Data Warehouse Automation
Snowflake enables comprehensive automation of data warehouse operations through tasks, streams, alerts, and monitoring capabilities that reduce manual intervention.
What is Warehouse Automation?
- Automates ETL/ELT pipelines with Tasks and Streams
- Monitors performance with Query History and Resource Monitors
- Enables self-service data access and provisioning
Architecture Overview
The warehouse automation architecture flows through four layers:
- Sources β APIs, Files, DBs, Streams
- Ingestion β Snowpipe, COPY INTO, Streams, External Functions
- Processing β Tasks, Stored Procs, UDFs, Pipelines
- Serving β Views, Shares, APIs, Alerts
Monitoring & Governance Layer
- Query History
- Access History
- Resource Monitors
- Alerts
Additional Capabilities
| Capability | Components |
|---|---|
| Task Automation | Cron scheduling, event-driven triggers, task dependencies, error recovery |
| Data Quality | Schema validation, null checks, freshness monitoring, business rules |
| Self-Service | Data catalog, usage analytics, cost allocation, self-provisioning |
Task Automation
Scheduled Tasks
-- Daily ETL task
CREATE OR REPLACE TASK daily_etl_task
WAREHOUSE = compute_wh
SCHEDULE = 'USING CRON 0 2 * * * America/New_York'
ERROR_INTEGRATION = slack_notification
AS
BEGIN
-- Step 1: Load data
COPY INTO raw_sales
FROM @staging_area/sales/
FILE_FORMAT = csv_format
ON_ERROR = 'CONTINUE';
-- Step 2: Transform
INSERT INTO curated_sales
SELECT
id,
TRIM(UPPER(customer_name)) as customer_name,
CAST(amount AS DECIMAL(10,2)) as amount,
order_date,
CURRENT_TIMESTAMP() as processed_at
FROM raw_sales
WHERE order_date >= DATEADD(day, -1, CURRENT_DATE());
-- Step 3: Validate
IF (SELECT COUNT(*) FROM curated_sales WHERE amount < 0) > 0 THEN
RAISE ERROR(10001, 'Negative amounts detected');
END IF;
END;
Event-Driven Tasks
-- Task triggered by stream
CREATE OR REPLACE TASK stream_processor
WAREHOUSE = compute_wh
WHEN SYSTEM$STREAM_HAS_DATA('my_stream')
AS
MERGE INTO target_table t
USING my_stream s
ON t.id = s.id
WHEN MATCHED AND s.METADATA$ACTION = 'DELETE' THEN DELETE
WHEN NOT MATCHED THEN INSERT *;
-- Task dependency chain
CREATE OR REPLACE TASK task_1 WAREHOUSE = wh AS INSERT INTO t1 SELECT 1;
CREATE OR REPLACE TASK task_2 WAREHOUSE = wh AFTER task_1 AS INSERT INTO t2 SELECT 2;
CREATE OR REPLACE TASK task_3 WAREHOUSE = wh AFTER task_2 AS INSERT INTO t3 SELECT 3;
Monitoring Queries
-- Task execution history
SELECT
name,
state,
completed_time,
scheduled_time,
error_code,
error_message
FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY())
WHERE scheduled_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
ORDER BY scheduled_time DESC;
-- Query performance monitoring
SELECT
query_text,
execution_status,
total_elapsed_time,
rows_produced,
bytes_scanned,
warehouse_name
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD(hour, -24, CURRENT_TIMESTAMP())
))
WHERE total_elapsed_time > 10000
ORDER BY total_elapsed_time DESC;
Resource Monitoring
-- Create resource monitor
CREATE OR REPLACE RESOURCE MONITOR warehouse_monitor
WITH CREDIT_QUOTA = 500
FREQUENCY = MONTHLY
START_TIMESTAMP = IMMEDIATELY
NOTIFY_USERS = ('admin@company.com')
TRIGGERS ON 75% DO NOTIFY
TRIGGERS ON 90% DO SUSPEND
TRIGGERS ON 100% DO SUSPEND_IMMEDIATELY;
-- Apply to warehouse
ALTER WAREHOUSE compute_wh SET RESOURCE_MONITOR = warehouse_monitor;
Use task dependencies to create DAG-like workflows. Snowflake automatically handles ordering and parallel execution. Monitor task runs through INFORMATION_SCHEMA.TASK_HISTORY for debugging and optimization.
Automation Patterns
| Pattern | Implementation | Use Case |
|---|---|---|
| Scheduled ETL | CRON tasks | Daily batch loads |
| Event-driven | Stream triggers | Real-time sync |
| Dependency chains | AFTER tasks | Complex workflows |
| Error recovery | TRY/CATCH | Resilient pipelines |
| Data quality | Validation tasks | Compliance |
- Tasks provide flexible scheduling with CRON and event triggers
- Task dependencies enable complex workflow orchestration
- Resource monitors prevent cost overruns
- Query history enables performance optimization
- Self-service patterns reduce IT bottlenecks