Snowflake Advanced Β· Interview Prep
Data Governance & Classification
Difficulty: Hard Β· Commonly asked at Google, Microsoft, Amazon
Interview Question
"How would you implement a comprehensive data governance framework in Snowflake? Walk me through data classification, tagging, access policies, and compliance monitoring."
βΉοΈ
Companies Asking This: Google (Senior Data Governance Engineer), Microsoft (Principal Data Architect), Amazon (L6 Data Engineer), Meta (Data Platform Engineer)
Governance Framework
Data Classification with Tags
-- 1. Create tags for data classification
CREATE TAG security_pii VARCHAR;
CREATE TAG data_sensitivity VARCHAR;
CREATE TAG data_owner VARCHAR;
CREATE TAG compliance_region VARCHAR;
CREATE TAG retention_period VARCHAR;
-- 2. Apply tags to tables
ALTER TABLE customers SET TAG security_pii = 'PII';
ALTER TABLE customers SET TAG data_sensitivity = 'HIGH';
ALTER TABLE customers SET TAG data_owner = 'CUSTOMER_SUCCESS';
ALTER TABLE customers SET TAG compliance_region = 'GDPR';
-- 3. Apply tags to columns
ALTER TABLE customers MODIFY COLUMN email SET TAG security_pii = 'EMAIL';
ALTER TABLE customers MODIFY COLUMN ssn SET TAG security_pii = 'SSN';
ALTER TABLE customers MODIFY COLUMN phone SET TAG security_pii = 'PHONE';
-- 4. Query tags
SELECT
tag_name,
tag_value,
object_name,
object_domain
FROM information_schema.tag_references
WHERE tag_name = 'SECURITY_PII';
-- 5. Create tag-based policies
CREATE OR REPLACE MASKING POLICY pii_tag_mask
AS (val VARCHAR) RETURNS VARCHAR ->
CASE
WHEN CURRENT_ROLE() IN ('DATA_ADMIN', 'PRIVACY_OFFICER') THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN '***MASKED***'
ELSE '***REDACTED***'
END;
-- 6. Auto-apply masking based on tags
ALTER TABLE customers MODIFY COLUMN email
SET MASKING POLICY pii_tag_mask
TAG (security_pii = 'EMAIL');
-- 7. Query all tagged objects
SELECT
tag_name,
tag_value,
COUNT(*) AS object_count
FROM information_schema.tag_references
GROUP BY 1, 2
ORDER BY 3 DESC;
Data Quality Framework
-- 1. Create data quality rules table
CREATE TABLE data_quality_rules (
rule_id NUMBER AUTOINCREMENT,
table_name VARCHAR(100),
column_name VARCHAR(100),
rule_type VARCHAR(50),
rule_expression VARCHAR(2000),
severity VARCHAR(20),
owner VARCHAR(100),
enabled BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);
-- 2. Insert quality rules
INSERT INTO data_quality_rules (table_name, column_name, rule_type, rule_expression, severity, owner)
VALUES
('CUSTOMERS', 'CUSTOMER_ID', 'NOT_NULL', 'CUSTOMER_ID IS NOT NULL', 'CRITICAL', 'DATA_ENGINEERING'),
('CUSTOMERS', 'EMAIL', 'FORMAT', 'REGEXP_LIKE(EMAIL, ''^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'')', 'HIGH', 'DATA_ENGINEERING'),
('ORDERS', 'AMOUNT', 'RANGE', 'AMOUNT > 0 AND AMOUNT < 1000000', 'HIGH', 'FINANCE'),
('ORDERS', 'ORDER_DATE', 'NOT_FUTURE', 'ORDER_DATE <= CURRENT_DATE()', 'CRITICAL', 'DATA_ENGINEERING');
-- 3. Quality check procedure
CREATE OR REPLACE PROCEDURE run_quality_check(table_name VARCHAR)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
DECLARE
results ARRAY DEFAULT ARRAY_CONSTRUCT();
rule_record RECORD;
violation_count NUMBER;
BEGIN
FOR rule_record IN
SELECT rule_id, column_name, rule_type, rule_expression, severity
FROM data_quality_rules
WHERE table_name = :table_name AND enabled = TRUE
DO
EXECUTE IMMEDIATE
'SELECT COUNT(*) FROM ' || :table_name ||
' WHERE NOT (' || rule_record.rule_expression || ')'
INTO violation_count;
results := ARRAY_APPEND(results, OBJECT_CONSTRUCT(
'rule_id', rule_record.rule_id,
'column', rule_record.column_name,
'rule_type', rule_record.rule_type,
'severity', rule_record.severity,
'violations', violation_count
));
END FOR;
RETURN OBJECT_CONSTRUCT(
'table', :table_name,
'rules_checked', ARRAY_SIZE(:results),
'results', :results
);
END;
$$;
-- 4. Schedule quality checks
CREATE OR REPLACE TASK daily_quality_check
WAREHOUSE = admin_wh
SCHEDULE = 'USING CRON 0 6 * * * UTC'
AS
CALL run_quality_check('CUSTOMERS');
Real-World Scenario: Google
Question: "How do you implement data lineage tracking in Snowflake to understand how data flows from source to reporting?"
Solution: Lineage Tracking Pattern
-- 1. Create lineage tracking table
CREATE TABLE data_lineage (
lineage_id NUMBER AUTOINCREMENT,
source_database VARCHAR(100),
source_schema VARCHAR(100),
source_table VARCHAR(100),
target_database VARCHAR(100),
target_schema VARCHAR(100),
target_table VARCHAR(100),
transformation_type VARCHAR(50),
transformation_logic VARCHAR(2000),
owner VARCHAR(100),
created_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);
-- 2. Populate lineage from ETL procedures
INSERT INTO data_lineage (source_database, source_schema, source_table,
target_database, target_schema, target_table,
transformation_type, transformation_logic, owner)
VALUES
('RAW', 'PUBLIC', 'RAW_ORDERS', 'PROD', 'ANALYTICS', 'FACT_ORDERS', 'ETL', 'CLEAN AND AGGREGATE', 'DATA_ENGINEERING'),
('RAW', 'PUBLIC', 'RAW_CUSTOMERS', 'PROD', 'ANALYTICS', 'DIM_CUSTOMERS', 'ETL', 'DEDUPLICATE AND VALIDATE', 'DATA_ENGINEERING'),
('PROD', 'ANALYTICS', 'FACT_ORDERS', 'PROD', 'REPORTING', 'DAILY_SALES', 'VIEW', 'AGGREGATE BY DATE', 'ANALYTICS');
-- 3. Lineage query function
CREATE OR REPLACE FUNCTION get_upstream_tables(target_table VARCHAR)
RETURNS TABLE (source_table VARCHAR, transformation VARCHAR)
LANGUAGE SQL AS
$$
WITH RECURSIVE lineage_chain AS (
SELECT source_database || '.' || source_schema || '.' || source_table AS source_table,
transformation_logic AS transformation,
target_database || '.' || target_schema || '.' || target_table AS target_table
FROM data_lineage
WHERE target_database || '.' || target_schema || '.' || target_table = target_table
UNION ALL
SELECT l.source_database || '.' || l.source_schema || '.' || l.source_table,
l.transformation_logic,
l.target_database || '.' || l.target_schema || '.' || l.target_table
FROM data_lineage l
JOIN lineage_chain lc ON l.target_database || '.' || l.target_schema || '.' || l.target_table = lc.source_table
)
SELECT source_table, transformation FROM lineage_chain;
$$;
-- 4. Query lineage
SELECT * FROM TABLE(get_upstream_tables('PROD.REPORTING.DAILY_SALES'));
Best Practices
| Practice | Implementation |
|---|---|
| Classification | Use tags for PII, sensitivity, ownership |
| Access control | RBAC with tag-based masking |
| Quality monitoring | Automated rules with alerts |
| Lineage tracking | Document all transformations |
| Compliance | Regular audits and reports |
| Documentation | Maintain data catalog |
β οΈ
Governance Anti-Patterns:
- No classification β Can't protect what you don't know
- Over-privileged roles β Use least privilege principle
- No quality checks β Bad data corrupts analytics
- Missing lineage β Can't trace data issues
- No audit trail β Compliance requires audit logs