πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Topic: Data Governance & Classification

Snowflake AdvancedData Governance⭐ Premium

Advertisement

Snowflake Advanced Β· Interview Prep

Data Governance & Classification

Difficulty: Hard Β· Commonly asked at Google, Microsoft, Amazon

Interview Question

"How would you implement a comprehensive data governance framework in Snowflake? Walk me through data classification, tagging, access policies, and compliance monitoring."

ℹ️

Companies Asking This: Google (Senior Data Governance Engineer), Microsoft (Principal Data Architect), Amazon (L6 Data Engineer), Meta (Data Platform Engineer)


Governance Framework

Data Governance FrameworkData Classificationβ€’ Tagsβ€’ Categoriesβ€’ SensitivityAccess Controlβ€’ RBACβ€’ Maskingβ€’ Row SecurityCompliance Monitoringβ€’ Auditβ€’ Reportsβ€’ AlertsData Catalogβ€’ Metadataβ€’ Descriptionsβ€’ SearchLineage Trackingβ€’ Dependenciesβ€’ Transformationsβ€’ Impact AnalysisQuality Monitoringβ€’ Rulesβ€’ Metricsβ€’ Alerts

Data Classification with Tags

-- 1. Create tags for data classification
CREATE TAG security_pii VARCHAR;
CREATE TAG data_sensitivity VARCHAR;
CREATE TAG data_owner VARCHAR;
CREATE TAG compliance_region VARCHAR;
CREATE TAG retention_period VARCHAR;

-- 2. Apply tags to tables
ALTER TABLE customers SET TAG security_pii = 'PII';
ALTER TABLE customers SET TAG data_sensitivity = 'HIGH';
ALTER TABLE customers SET TAG data_owner = 'CUSTOMER_SUCCESS';
ALTER TABLE customers SET TAG compliance_region = 'GDPR';

-- 3. Apply tags to columns
ALTER TABLE customers MODIFY COLUMN email SET TAG security_pii = 'EMAIL';
ALTER TABLE customers MODIFY COLUMN ssn SET TAG security_pii = 'SSN';
ALTER TABLE customers MODIFY COLUMN phone SET TAG security_pii = 'PHONE';

-- 4. Query tags
SELECT 
    tag_name,
    tag_value,
    object_name,
    object_domain
FROM information_schema.tag_references
WHERE tag_name = 'SECURITY_PII';

-- 5. Create tag-based policies
CREATE OR REPLACE MASKING POLICY pii_tag_mask 
AS (val VARCHAR) RETURNS VARCHAR ->
    CASE
        WHEN CURRENT_ROLE() IN ('DATA_ADMIN', 'PRIVACY_OFFICER') THEN val
        WHEN CURRENT_ROLE() = 'ANALYST' THEN '***MASKED***'
        ELSE '***REDACTED***'
    END;

-- 6. Auto-apply masking based on tags
ALTER TABLE customers MODIFY COLUMN email 
    SET MASKING POLICY pii_tag_mask 
    TAG (security_pii = 'EMAIL');

-- 7. Query all tagged objects
SELECT 
    tag_name,
    tag_value,
    COUNT(*) AS object_count
FROM information_schema.tag_references
GROUP BY 1, 2
ORDER BY 3 DESC;

Data Quality Framework

-- 1. Create data quality rules table
CREATE TABLE data_quality_rules (
    rule_id NUMBER AUTOINCREMENT,
    table_name VARCHAR(100),
    column_name VARCHAR(100),
    rule_type VARCHAR(50),
    rule_expression VARCHAR(2000),
    severity VARCHAR(20),
    owner VARCHAR(100),
    enabled BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

-- 2. Insert quality rules
INSERT INTO data_quality_rules (table_name, column_name, rule_type, rule_expression, severity, owner)
VALUES 
    ('CUSTOMERS', 'CUSTOMER_ID', 'NOT_NULL', 'CUSTOMER_ID IS NOT NULL', 'CRITICAL', 'DATA_ENGINEERING'),
    ('CUSTOMERS', 'EMAIL', 'FORMAT', 'REGEXP_LIKE(EMAIL, ''^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'')', 'HIGH', 'DATA_ENGINEERING'),
    ('ORDERS', 'AMOUNT', 'RANGE', 'AMOUNT > 0 AND AMOUNT < 1000000', 'HIGH', 'FINANCE'),
    ('ORDERS', 'ORDER_DATE', 'NOT_FUTURE', 'ORDER_DATE <= CURRENT_DATE()', 'CRITICAL', 'DATA_ENGINEERING');

-- 3. Quality check procedure
CREATE OR REPLACE PROCEDURE run_quality_check(table_name VARCHAR)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
DECLARE
    results ARRAY DEFAULT ARRAY_CONSTRUCT();
    rule_record RECORD;
    violation_count NUMBER;
BEGIN
    FOR rule_record IN 
        SELECT rule_id, column_name, rule_type, rule_expression, severity
        FROM data_quality_rules
        WHERE table_name = :table_name AND enabled = TRUE
    DO
        EXECUTE IMMEDIATE 
            'SELECT COUNT(*) FROM ' || :table_name || 
            ' WHERE NOT (' || rule_record.rule_expression || ')'
            INTO violation_count;
        
        results := ARRAY_APPEND(results, OBJECT_CONSTRUCT(
            'rule_id', rule_record.rule_id,
            'column', rule_record.column_name,
            'rule_type', rule_record.rule_type,
            'severity', rule_record.severity,
            'violations', violation_count
        ));
    END FOR;
    
    RETURN OBJECT_CONSTRUCT(
        'table', :table_name,
        'rules_checked', ARRAY_SIZE(:results),
        'results', :results
    );
END;
$$;

-- 4. Schedule quality checks
CREATE OR REPLACE TASK daily_quality_check
    WAREHOUSE = admin_wh
    SCHEDULE = 'USING CRON 0 6 * * * UTC'
AS
    CALL run_quality_check('CUSTOMERS');

Real-World Scenario: Google

Question: "How do you implement data lineage tracking in Snowflake to understand how data flows from source to reporting?"

Solution: Lineage Tracking Pattern

-- 1. Create lineage tracking table
CREATE TABLE data_lineage (
    lineage_id NUMBER AUTOINCREMENT,
    source_database VARCHAR(100),
    source_schema VARCHAR(100),
    source_table VARCHAR(100),
    target_database VARCHAR(100),
    target_schema VARCHAR(100),
    target_table VARCHAR(100),
    transformation_type VARCHAR(50),
    transformation_logic VARCHAR(2000),
    owner VARCHAR(100),
    created_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

-- 2. Populate lineage from ETL procedures
INSERT INTO data_lineage (source_database, source_schema, source_table, 
                          target_database, target_schema, target_table,
                          transformation_type, transformation_logic, owner)
VALUES 
    ('RAW', 'PUBLIC', 'RAW_ORDERS', 'PROD', 'ANALYTICS', 'FACT_ORDERS', 'ETL', 'CLEAN AND AGGREGATE', 'DATA_ENGINEERING'),
    ('RAW', 'PUBLIC', 'RAW_CUSTOMERS', 'PROD', 'ANALYTICS', 'DIM_CUSTOMERS', 'ETL', 'DEDUPLICATE AND VALIDATE', 'DATA_ENGINEERING'),
    ('PROD', 'ANALYTICS', 'FACT_ORDERS', 'PROD', 'REPORTING', 'DAILY_SALES', 'VIEW', 'AGGREGATE BY DATE', 'ANALYTICS');

-- 3. Lineage query function
CREATE OR REPLACE FUNCTION get_upstream_tables(target_table VARCHAR)
RETURNS TABLE (source_table VARCHAR, transformation VARCHAR)
LANGUAGE SQL AS
    $$
    WITH RECURSIVE lineage_chain AS (
        SELECT source_database || '.' || source_schema || '.' || source_table AS source_table,
               transformation_logic AS transformation,
               target_database || '.' || target_schema || '.' || target_table AS target_table
        FROM data_lineage
        WHERE target_database || '.' || target_schema || '.' || target_table = target_table
        
        UNION ALL
        
        SELECT l.source_database || '.' || l.source_schema || '.' || l.source_table,
               l.transformation_logic,
               l.target_database || '.' || l.target_schema || '.' || l.target_table
        FROM data_lineage l
        JOIN lineage_chain lc ON l.target_database || '.' || l.target_schema || '.' || l.target_table = lc.source_table
    )
    SELECT source_table, transformation FROM lineage_chain;
    $$;

-- 4. Query lineage
SELECT * FROM TABLE(get_upstream_tables('PROD.REPORTING.DAILY_SALES'));

Best Practices

PracticeImplementation
ClassificationUse tags for PII, sensitivity, ownership
Access controlRBAC with tag-based masking
Quality monitoringAutomated rules with alerts
Lineage trackingDocument all transformations
ComplianceRegular audits and reports
DocumentationMaintain data catalog

⚠️

Governance Anti-Patterns:

  1. No classification β€” Can't protect what you don't know
  2. Over-privileged roles β€” Use least privilege principle
  3. No quality checks β€” Bad data corrupts analytics
  4. Missing lineage β€” Can't trace data issues
  5. No audit trail β€” Compliance requires audit logs

Advertisement