Snowflake Advanced · Interview Prep

Time Travel & Fail-Safe

Difficulty: Hard · Commonly asked at Amazon, Uber, Google

Interview Question

"A production table was accidentally truncated at 2:45 PM. The last backup was from 1:00 AM. How would you recover the data using Snowflake Time Travel? What if the data was modified at 2:00 PM and you need the exact state at 1:59 PM?"

ℹ️

Companies Asking This: Amazon (L5/L6 Data Engineer), Uber (Senior Data Engineer), Google (Cloud Data Architect), Netflix (Staff Data Engineer)

Time Travel Fundamentals

Time Travel allows you to query historical data at any point within the configured retention period. Snowflake maintains historical data automatically without requiring separate backup infrastructure.

Data Retention Configuration

-- Check current retention period for a table
SHOW PARAMETERS LIKE '%DATA_RETENTION_TIME_IN_DAYS%' 
    IN TABLE my_schema.my_table;

-- Set retention period at table level (must be <= account-level max)
ALTER TABLE orders SET DATA_RETENTION_TIME_IN_DAYS = 90;

-- Set retention period at schema level (applies to all tables)
ALTER SCHEMA production SET DATA_RETENTION_TIME_IN_DAYS = 30;

-- Set account-level default
ALTER ACCOUNT SET DATA_RETENTION_TIME_IN_DAYS = 14;

-- Check account-level setting
SHOW PARAMETERS LIKE '%DATA_RETENTION_TIME_IN_DAYS%';

-- Verify retention for all tables
SELECT 
    table_schema,
    table_name,
    retention_time,
    row_count,
    bytes / (1024*1024) AS size_mb
FROM information_schema.tables
WHERE table_catalog = CURRENT_DATABASE()
ORDER BY retention_time DESC;

ℹ️

Key Insight: Standard Edition supports 1 day retention (configurable up to 1 day). Enterprise Edition supports up to 90 days. Business Critical supports up to 90 days. The retention period applies to micro-partitions, not individual rows.

Time Travel Query Patterns

Pattern 1: Query at Specific Timestamp

-- Query data as it was at a specific timestamp
SELECT * FROM orders 
    AT (TIMESTAMP => '2024-01-15 14:44:00'::TIMESTAMP_NTZ);

-- Find the exact state before the truncate
SELECT * FROM orders 
    AT (TIMESTAMP => '2024-01-15 14:44:59'::TIMESTAMP_NTZ);

-- Compare current vs historical data
SELECT 
    'CURRENT' AS data_version,
    COUNT(*) AS row_count,
    SUM(total_amount) AS total_revenue
FROM orders
UNION ALL
SELECT 
    'HISTORICAL_1PM' AS data_version,
    COUNT(*) AS row_count,
    SUM(total_amount) AS total_revenue
FROM orders AT (TIMESTAMP => '2024-01-15 13:00:00'::TIMESTAMP_NTZ);

Pattern 2: Query Before Specific Statement

-- Get the query ID of the problematic statement
SELECT 
    query_id,
    query_text,
    user_name,
    start_time,
    end_time
FROM snowflake.account_usage.query_history
WHERE query_type = 'TRUNCATE_TABLE'
  AND start_time >= '2024-01-15 14:40:00'
  AND start_time <= '2024-01-15 14:50:00';

-- Query data before that specific statement
SELECT * FROM orders 
    BEFORE (STATEMENT => 'abc123-query-id-here');

-- Restore from before the truncate
CREATE TABLE orders_restored AS
SELECT * FROM orders 
    BEFORE (STATEMENT => 'abc123-query-id-here');

-- Verify restored data
SELECT COUNT(*) FROM orders_restored;

Pattern 3: Query at Specific Offset

-- Query data from 1 hour ago
SELECT * FROM orders 
    AT (OFFSET => -3600);  -- 3600 seconds = 1 hour

-- Query data from 5 minutes ago
SELECT * FROM orders 
    AT (OFFSET => -300);

-- Compare last hour's data
SELECT 
    'NOW' AS snapshot,
    COUNT(*) AS row_count
FROM orders
UNION ALL
SELECT 
    '1H_AGO' AS snapshot,
    COUNT(*) AS row_count
FROM orders AT (OFFSET => -3600)
UNION ALL
SELECT 
    '24H_AGO' AS snapshot,
    COUNT(*) AS row_count
FROM orders AT (OFFSET => -86400);

Real-World Scenario: Amazon

Question: "Design a data recovery strategy for a critical financial table that must support point-in-time recovery for up to 30 days."

Solution: Multi-Layer Recovery Strategy

-- 1. Set appropriate retention period
ALTER TABLE financial_transactions 
    SET DATA_RETENTION_TIME_IN_DAYS = 30;

-- 2. Create a daily snapshot table (belt and suspenders approach)
CREATE OR REPLACE PROCEDURE create_daily_snapshot(table_name VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
    snapshot_name VARCHAR;
BEGIN
    snapshot_name := table_name || '_SNAPSHOT_' || 
                     TO_CHAR(CURRENT_DATE(), 'YYYYMMDD');
    
    EXECUTE IMMEDIATE 
        'CREATE TABLE ' || snapshot_name || ' CLONE ' || table_name;
    
    RETURN 'Created snapshot: ' || snapshot_name;
END;
$$;

-- Schedule daily snapshots
CREATE OR REPLACE TASK daily_financial_snapshot
    WAREHOUSE = admin_wh
    SCHEDULE = 'USING CRON 0 2 * * * UTC'  -- 2 AM UTC daily
AS
    CALL create_daily_snapshot('FINANCIAL_TRANSACTIONS');

-- 3. Implement change tracking with Streams
CREATE OR REPLACE STREAM financial_changes
    ON TABLE financial_transactions
    SHOW_INITIAL_ROWS = TRUE
    APPEND_ONLY = FALSE;

-- 4. Create an audit log for all changes
CREATE TABLE financial_audit_log (
    audit_id NUMBER AUTOINCREMENT,
    table_name VARCHAR,
    operation VARCHAR,
    change_timestamp TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
    query_id VARCHAR,
    user_name VARCHAR,
    before_data VARIANT,
    after_data VARIANT
);

-- Populate audit log from stream
INSERT INTO financial_audit_log 
    (table_name, operation, query_id, user_name, before_data, after_data)
SELECT 
    'FINANCIAL_TRANSACTIONS',
    METADATA$ACTION,
    METADATA$QUERY_ID,
    CURRENT_USER(),
    CASE WHEN METADATA$ACTION = 'DELETE' THEN TO_VARIANT(OBJECT_CONSTRUCT(*)) END,
    CASE WHEN METADATA$ACTION = 'INSERT' THEN TO_VARIANT(OBJECT_CONSTRUCT(*)) END
FROM financial_changes;

-- 5. Recovery procedure
CREATE OR REPLACE PROCEDURE recover_to_timestamp(
    table_name VARCHAR,
    recovery_timestamp TIMESTAMP_NTZ
)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
    recovered_table VARCHAR;
BEGIN
    recovered_table := table_name || '_RECOVERED_' || 
                       TO_CHAR(recovery_timestamp, 'YYYYMMDD_HH24MI');
    
    EXECUTE IMMEDIATE 
        'CREATE TABLE ' || recovered_table || 
        ' AS SELECT * FROM ' || table_name || 
        ' AT (TIMESTAMP => \'' || 
        TO_CHAR(recovery_timestamp, 'YYYY-MM-DD HH24:MI:SS') || 
        '\'::TIMESTAMP_NTZ)';
    
    RETURN 'Recovered to table: ' || recovered_table;
END;
$$;

Real-World Scenario: Uber

Question: "How do you handle a scenario where data was corrupted by an incorrect UPDATE statement, and you need to identify exactly which rows were affected?"

Solution: Change Analysis with Time Travel

-- 1. First, identify the problematic query
SELECT 
    query_id,
    query_text,
    user_name,
    start_time,
    rows_affected,
    execution_status
FROM snowflake.account_usage.query_history
WHERE query_type = 'UPDATE'
  AND table_name = 'riders'
  AND start_time >= DATEADD(hour, -24, CURRENT_TIMESTAMP())
ORDER BY start_time DESC;

-- 2. Compare data before and after the update
-- Get the query ID from above
-- Example: 'xyz-123-query-id'

-- Rows that were changed (inserted, updated, deleted)
SELECT * FROM riders 
    CHANGES(INFORMATION_SCHEMA.APPLY_CHANGES_START => 'abc123-query-id')
    BEFORE (STATEMENT => 'xyz-123-query-id');

-- Alternative: Direct comparison
-- Before the update
CREATE TEMPORARY TABLE riders_before AS
SELECT * FROM riders 
    BEFORE (STATEMENT => 'xyz-123-query-id');

-- After the update (current state)
CREATE TEMPORARY TABLE riders_after AS
SELECT * FROM riders;

-- Find specific rows that changed
SELECT 
    a.rider_id,
    a.name AS current_name,
    b.name AS previous_name,
    a.email AS current_email,
    b.email AS previous_email,
    a.updated_at AS current_updated,
    b.updated_at AS previous_updated
FROM riders_after a
JOIN riders_before b ON a.rider_id = b.rider_id
WHERE a.name != b.name 
   OR a.email != b.email
   OR a.status != b.status;

-- Count affected rows
SELECT 
    COUNT(*) AS rows_affected,
    SUM(CASE WHEN a.status != b.status THEN 1 ELSE 0 END) AS status_changes,
    SUM(CASE WHEN a.email != b.email THEN 1 ELSE 0 END) AS email_changes
FROM riders_after a
JOIN riders_before b ON a.rider_id = b.rider_id
WHERE a.name != b.name 
   OR a.email != b.email
   OR a.status != b.status;

-- 3. Restore only affected rows
MERGE INTO riders a
USING riders_before b
ON a.rider_id = b.rider_id
WHEN MATCHED AND (
    a.name != b.name OR 
    a.email != b.email OR 
    a.status != b.status
) THEN
    UPDATE SET 
        a.name = b.name,
        a.email = b.email,
        a.status = b.status;

Fail-Safe Mechanism

Fail-Safe is the period after Time Travel during which historical data is retained for disaster recovery. This is managed by Snowflake and cannot be queried directly.

⚠️

Critical Distinction: Time Travel data can be queried by users. Fail-Safe data is only accessible by Snowflake Support for disaster recovery. You cannot query Fail-Safe data directly — you must open a support ticket.

Fail-Safe Configuration

-- Fail-Safe periods (not configurable by users)
-- Standard Edition: 7 days after Time Travel retention
-- Enterprise Edition: 7 days after Time Travel retention
-- Business Critical: 7 days after Time Travel retention

-- Check Time Travel + Fail-Safe coverage
SELECT 
    table_name,
    retention_time AS time_travel_days,
    retention_time + 7 AS total_protection_days,
    CASE 
        WHEN retention_time >= 7 THEN 'EXCELLENT'
        WHEN retention_time >= 1 THEN 'GOOD'
        ELSE 'MINIMAL'
    END AS protection_level
FROM information_schema.tables
WHERE table_schema = 'PRODUCTION';

-- Example: If retention_time = 90 days
-- Time Travel: 90 days (queryable)
-- Fail-Safe: 7 additional days (not queryable)
-- Total protection: 97 days

Edge Cases & Advanced Patterns

Pattern: Cross-Table Point-in-Time Recovery

-- Recover multiple related tables to the same point in time
CREATE OR REPLACE PROCEDURE recover_schema_to_timestamp(
    schema_name VARCHAR,
    recovery_timestamp TIMESTAMP_NTZ
)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
    table_cursor CURSOR FOR
        SELECT table_name 
        FROM information_schema.tables 
        WHERE table_schema = :schema_name;
    table_name VARCHAR;
    recovered_count NUMBER DEFAULT 0;
BEGIN
    FOR table_record IN table_cursor DO
        table_name := table_record.table_name;
        
        EXECUTE IMMEDIATE 
            'CREATE TABLE ' || schema_name || '.RECOVERED_' || table_name ||
            ' AS SELECT * FROM ' || schema_name || '.' || table_name ||
            ' AT (TIMESTAMP => \'' || 
            TO_CHAR(recovery_timestamp, 'YYYY-MM-DD HH24:MI:SS') || 
            '\'::TIMESTAMP_NTZ)';
        
        recovered_count := recovered_count + 1;
    END FOR;
    
    RETURN 'Recovered ' || recovered_count || ' tables to ' || 
           TO_CHAR(recovery_timestamp, 'YYYY-MM-DD HH24:MI:SS');
END;
$$;

Pattern: Time Travel for Data Quality Validation

-- Validate data quality by comparing historical snapshots
WITH daily_metrics AS (
    SELECT 
        DATE(start_time) AS metric_date,
        COUNT(*) AS total_rows,
        COUNT(DISTINCT customer_id) AS unique_customers,
        SUM(amount) AS total_amount,
        AVG(amount) AS avg_amount
    FROM orders AT (OFFSET => -86400 * seq)  -- Each day back
    GROUP BY 1
)
SELECT 
    metric_date,
    total_rows,
    unique_customers,
    total_amount,
    avg_amount,
    LAG(total_rows) OVER (ORDER BY metric_date) AS prev_day_rows,
    total_rows - LAG(total_rows) OVER (ORDER BY metric_date) AS row_change,
    ROUND(
        (total_rows - LAG(total_rows) OVER (ORDER BY metric_date)) * 100.0 / 
        NULLIF(LAG(total_rows) OVER (ORDER BY metric_date), 0), 2
    ) AS pct_change
FROM daily_metrics
ORDER BY metric_date DESC;

Best Practices

Practice	Recommendation
Retention period	Set 7-30 days for production tables, 1-3 days for dev
Snapshot strategy	Use Time Travel as primary, CLONE as secondary
Recovery testing	Test point-in-time recovery monthly
Cost awareness	Higher retention = higher storage costs
Fail-safe reliance	Don't plan around Fail-Safe — it's emergency only

⚠️

Cost Impact: Time Travel data counts toward storage costs. A 1TB table with 90-day retention uses up to 90TB of historical storage. Budget accordingly and set retention periods based on actual recovery requirements, not "just in case."

Topic: Time Travel & Fail-Safe