Dynamic Data Masking in Snowflake
Architecture Overview
Architecture Diagram 2: Masking Policy Types Comparison
Dynamic Data Masking is a security feature that transforms sensitive data at query time based on the executing role. It provides column-level security without modifying stored data β applying role-based transformations (full, partial, hash, null) transparently.
A masking policy is a named object containing SQL expressions that define transformation logic. It accepts the original column value, current role, and session context, returning a transformed value. Multiple policies can exist on a table; one active policy per column.
Masking Performance Impact
- Full masking ("***"): Zero overhead β constant replacement
- Partial masking (XX-XX-1234): Minimal overhead β string operations
- Hash masking (SHA-256): Moderate overhead β cryptographic hash per row
- Conditional masking: Slight overhead β role-based branching
- Overall impact: < 1% query performance degradation (negligible)
Use full masking for PII (SSN, credit card). Use partial masking for phone numbers (show last 4). Use hashing for data matching across systems. Create separate policies per sensitivity level. Audit quarterly for policy coverage gaps.
- Column-level security: Transform data per role without changing storage
- Zero overhead: Original data preserved; masking applied at query layer only
- Multiple types: Full, partial, hash, null, external function transforms
- Role-based: Different roles see different representations of same data
- Audit: Use POLICY_REFERENCES() and ACCESS_HISTORY for compliance reporting
Detailed Explanation
What is Dynamic Data Masking (DDM)?
Security feature that transforms sensitive data at query time based on the executing role β without modifying stored data.
How Masking Policies Work
- Query arrives β column accessed
- Policy lookup (cached in memory)
- CASE expression evaluates role
- Execute branch for matching role β return transformed value
Transformation Functions
| Function | Overhead | Use Case |
|---|---|---|
Full masking (***) | Zero | Highly sensitive data |
| Partial masking (XX-XX-1234) | Minimal | Semi-sensitive data |
| Hash masking (SHA-256) | Moderate | Data matching across systems |
| Conditional masking | Slight | Role-based branching |
- Zero storage overhead β original data preserved
- < 1% query performance degradation (negligible)
Column-Level Security
Different users query the same table but receive different data views:
- Customer service: last 4 digits of credit card
- Fraud analyst: full card number
- Auditor: audit trail of access
Conditional Masking
- Time-based: business hours vs. after-hours
- Network-based: internal vs. external access
- Session-based: warehouse or role context
Compliance Support
| Regulation | Masking Strategy |
|---|---|
| GDPR | Full masking for PII fields |
| CCPA | Partial masking + access controls |
| HIPAA | Full masking + audit trails |
| PCI-DSS | Tokenization + encryption |
| SOX | Role-based masking + audit |
Key Takeaway: Use full masking for PII (SSN, credit card). Use partial masking for phone numbers (show last 4). Audit quarterly for policy coverage gaps.
Key Concepts
| Concept | Description | Use Case |
|---|---|---|
| Masking Policy | SQL object defining masking rules for columns | Apply consistent masking across multiple tables |
| Conditional Masking | Role-based or context-aware data transformation | Different masks for different user roles |
| Column-Level Security | Fine-grained access control at column level | Protect PII while allowing query access |
| Tokenization | Replace sensitive data with non-reversible tokens | PCI-DSS compliance for payment data |
| External Tokenization | Token generation via external services | Integration with enterprise tokenization systems |
| Masking Functions | Built-in functions for data transformation | SHA-256, AES encryption, partial masking |
| Policy Assignment | Attaching masking policies to table columns | Apply policies to existing tables |
| Policy Stacking | Multiple policies on a single column | Layered security controls |
| Session Context | User/session attributes for policy evaluation | Dynamic masking based on runtime context |
| Data Classification | Automated sensitive data detection | Identify columns requiring masking |
Code Examples
1. Creating a Basic Masking Policy
-- Create a masking policy for PII data
-- Parameters:
-- (val STRING): Input column value to mask
-- RETURNS STRING: Output type (must match column type)
-- CURRENT_ROLE(): Built-in function returning the executing role
-- CASE WHEN: Role-based branching logic
CREATE OR REPLACE MASKING POLICY pii_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'SECURITY_OFFICER') THEN val -- Full access
WHEN CURRENT_ROLE() = 'ANALYST' THEN REGEXP_REPLACE(val, '.', '*') -- Full mask
WHEN CURRENT_ROLE() = 'SUPPORT' THEN
CONCAT(SUBSTRING(val, 1, 2), REPEAT('*', LENGTH(val) - 4), SUBSTRING(val, -2)) -- Partial mask
ELSE '***MASKED***' -- Default mask for other roles
END;
-- Apply masking policy to a column
-- ALTER TABLE ... MODIFY COLUMN ... SET MASKING POLICY attaches the policy
-- One active policy per column; re-applying replaces the previous policy
ALTER TABLE customers MODIFY COLUMN email SET MASKING POLICY pii_masking_policy;
-- Apply same policy to multiple columns
ALTER TABLE customers MODIFY COLUMN phone_number SET MASKING POLICY pii_masking_policy;
ALTER TABLE customers MODIFY COLUMN ssn SET MASKING POLICY pii_masking_policy;
-- Remove a masking policy (column returns to original values)
ALTER TABLE customers MODIFY COLUMN email UNSET MASKING POLICY;
2. Conditional Masking Based on Context
-- Create a conditional masking policy with time-based rules
-- HOUR(CURRENT_TIMESTAMP()) enables business-hours vs after-hours masking
-- This is useful for compliance where data access varies by time of day
CREATE OR REPLACE MASKING POLICY conditional_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
-- Full access during business hours for admins
WHEN CURRENT_ROLE() = 'ADMIN'
AND HOUR(CURRENT_TIMESTAMP()) BETWEEN 8 AND 18 THEN val
-- Partial mask during business hours for analysts
WHEN CURRENT_ROLE() = 'ANALYST'
AND HOUR(CURRENT_TIMESTAMP()) BETWEEN 8 AND 18 THEN
CONCAT(SUBSTRING(val, 1, 3), '***', SUBSTRING(val, -3))
-- Full mask outside business hours for everyone
ELSE '***RESTRICTED***'
END;
-- Create a masking policy with warehouse-based conditions
-- Different masking based on which warehouse is being used
-- Useful for external/partner access scenarios
CREATE OR REPLACE MASKING POLICY network_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() = 'ADMIN' THEN val
WHEN CURRENT_WAREHOUSE() IN ('EXTERNAL_WH', 'PARTNER_WH') THEN
CONCAT('EXTERNAL_', HASH(val, 256)) -- Hash for external access
ELSE val
END;
-- Create a masking policy with session context
-- Use CURRENT_SESSION() or custom session variables for more context
CREATE OR REPLACE MASKING POLICY session_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() = 'DATA_ENGINEER' THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN
CASE
WHEN CURRENT_WAREHOUSE() = 'PROD_WH' THEN val -- Full access in prod
ELSE CONCAT(SUBSTRING(val, 1, 3), '***') -- Masked in dev
END
ELSE '***MASKED***'
END;
3. Numeric and Date Masking
-- Create a masking policy for numeric data
-- Progressive granularity: Finance sees exact, Analysts see rounded, Others see zero
CREATE OR REPLACE MASKING POLICY numeric_masking_policy AS (val NUMBER)
RETURNS NUMBER ->
CASE
WHEN CURRENT_ROLE() = 'ADMIN' THEN val -- Exact value
WHEN CURRENT_ROLE() = 'FINANCE' THEN ROUND(val, -1) -- Round to nearest 10
WHEN CURRENT_ROLE() = 'ANALYST' THEN ROUND(val, -2) -- Round to nearest 100
ELSE 0 -- Zero for others
END;
-- Create a masking policy for date data
-- Progressive granularity: exact date -> month -> year -> fixed date
CREATE OR REPLACE MASKING POLICY date_masking_policy AS (val DATE)
RETURNS DATE ->
CASE
WHEN CURRENT_ROLE() = 'ADMIN' THEN val -- Exact date
WHEN CURRENT_ROLE() = 'ANALYST' THEN DATE_TRUNC('MONTH', val) -- First day of month
WHEN CURRENT_ROLE() = 'SUPPORT' THEN DATE_TRUNC('YEAR', val) -- First day of year
ELSE '1900-01-01'::DATE -- Fixed date for others
END;
-- Create a masking policy for TIMESTAMP data
CREATE OR REPLACE MASKING POLICY timestamp_masking_policy AS (val TIMESTAMP_NTZ)
RETURNS TIMESTAMP_NTZ ->
CASE
WHEN CURRENT_ROLE() = 'ADMIN' THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN DATE_TRUNC('DAY', val) -- Truncate to day
ELSE '1900-01-01'::TIMESTAMP_NTZ
END;
-- Apply policies to financial table
ALTER TABLE financial_transactions MODIFY COLUMN amount SET MASKING POLICY numeric_masking_policy;
ALTER TABLE financial_transactions MODIFY COLUMN transaction_date SET MASKING POLICY date_masking_policy;
ALTER TABLE financial_transactions MODIFY COLUMN created_at SET MASKING POLICY timestamp_masking_policy;
4. Tokenization and Hashing Policies
-- Create a tokenization masking policy (irreversible hash)
-- HASH() function produces a deterministic hash for data matching
-- Useful for joining across systems without exposing raw values
CREATE OR REPLACE MASKING POLICY tokenization_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'TOKENIZER') THEN val
ELSE HASH(val, 256) -- SHA-256 hash (irreversible)
END;
-- Create a reversible tokenization policy using AES encryption
-- AES_ENCRYPT produces encrypted values that can be decrypted
-- Note: Key management is critical for security
CREATE OR REPLACE MASKING POLICY aes_tokenization_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'DETOKENIZER') THEN val
ELSE AES_ENCRYPT(val, 'your-secret-key-here') -- Encrypted token
END;
-- Create a masking policy for email with partial reveal
CREATE OR REPLACE MASKING POLICY email_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'DATA_ENGINEER') THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN
CONCAT(SUBSTRING(val, 1, 2), '***@', SPLIT_PART(val, '@', 2))
ELSE '***@***.com'
END;
-- Create a masking policy for phone numbers
CREATE OR REPLACE MASKING POLICY phone_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'SUPPORT') THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN
CONCAT('(***) ***-', SUBSTRING(val, -4)) -- Show last 4 digits
ELSE '(***) ***-****'
END;
-- Apply tokenization and masking to sensitive columns
ALTER TABLE customer_data MODIFY COLUMN credit_card_number SET MASKING POLICY tokenization_policy;
ALTER TABLE customer_data MODIFY COLUMN ssn SET MASKING POLICY aes_tokenization_policy;
ALTER TABLE customer_data MODIFY COLUMN email SET MASKING POLICY email_masking_policy;
ALTER TABLE customer_data MODIFY COLUMN phone SET MASKING POLICY phone_masking_policy;
5. Policy Management and Auditing
-- List all masking policies in the account
SHOW MASKING POLICIES;
-- View masking policy details
DESC MASKING POLICY pii_masking_policy;
-- Find which columns have masking policies applied
SELECT
policy_name,
ref_database_name,
ref_schema_name,
ref_entity_name,
ref_entity_domain
FROM TABLE(INFORMATION_SCHEMA.POLICY_REFERENCES(
REF_ENTITY_NAME => 'customers',
REF_ENTITY_DOMAIN => 'TABLE'
));
-- Audit masking policy usage via ACCESS_HISTORY
SELECT
query_id,
user_name,
query_text,
direct_objects_accessed
FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY
WHERE query_start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
AND direct_objects_accessed LIKE '%customers%'
ORDER BY query_start_time DESC;
-- Check policy references for a specific table
SELECT * FROM TABLE(INFORMATION_SCHEMA.POLICY_REFERENCES(
REF_ENTITY_NAME => 'financial_transactions',
REF_ENTITY_DOMAIN => 'TABLE'
));
6. Python Implementation
import snowflake.connector
from snowflake.connector import DictCursor
def create_masking_policies():
"""Create comprehensive masking policies in Snowflake"""
conn = snowflake.connector.connect(
user='your_user',
password='your_password',
account='your_account',
warehouse='COMPUTE_WH',
database='SECURITY_DB',
schema='MASKING'
)
try:
cursor = conn.cursor()
# Create masking policy for email
cursor.execute("""
CREATE OR REPLACE MASKING POLICY email_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'DATA_ENGINEER') THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN
CONCAT(SUBSTRING(val, 1, 2), '***@', SPLIT_PART(val, '@', 2))
ELSE '***@***.com'
END
""")
# Create masking policy for phone numbers
cursor.execute("""
CREATE OR REPLACE MASKING POLICY phone_masking_policy AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'SUPPORT') THEN val
WHEN CURRENT_ROLE() = 'ANALYST' THEN
CONCAT('(***) ***-', SUBSTRING(val, -4))
ELSE '(***) ***-****'
END
""")
# Create masking policy for financial data
cursor.execute("""
CREATE OR REPLACE MASKING POLICY financial_masking_policy AS (val NUMBER)
RETURNS NUMBER ->
CASE
WHEN CURRENT_ROLE() = 'FINANCE_ADMIN' THEN val
WHEN CURRENT_ROLE() = 'FINANCE_ANALYST' THEN ROUND(val, -2)
WHEN CURRENT_ROLE() = 'ANALYST' THEN ROUND(val, -3)
ELSE 0
END
""")
# Apply policies to tables
cursor.execute("""
ALTER TABLE customer_data
MODIFY COLUMN email SET MASKING POLICY email_masking_policy;
""")
cursor.execute("""
ALTER TABLE customer_data
MODIFY COLUMN phone SET MASKING POLICY phone_masking_policy;
""")
cursor.execute("""
ALTER TABLE financial_data
MODIFY COLUMN amount SET MASKING POLICY financial_masking_policy;
""")
print("Masking policies created and applied successfully!")
finally:
conn.close()
def query_masked_data():
"""Demonstrate how different roles see different data"""
conn = snowflake.connector.connect(
user='your_user',
password='your_password',
account='your_account',
warehouse='COMPUTE_WH',
database='SECURITY_DB',
schema='MASKING'
)
try:
cursor = conn.cursor()
# Query as analyst role
cursor.execute("USE ROLE ANALYST")
cursor.execute("""
SELECT
customer_id,
email,
phone,
credit_card_number
FROM customer_data
LIMIT 5
""")
print("Data viewed as ANALYST:")
for row in cursor.fetchall():
print(f" ID: {row[0]}, Email: {row[1]}, Phone: {row[2]}, CC: {row[3]}")
# Query as admin role
cursor.execute("USE ROLE ADMIN")
cursor.execute("""
SELECT
customer_id,
email,
phone,
credit_card_number
FROM customer_data
LIMIT 5
""")
print("\nData viewed as ADMIN:")
for row in cursor.fetchall():
print(f" ID: {row[0]}, Email: {row[1]}, Phone: {row[2]}, CC: {row[3]}")
finally:
conn.close()
if __name__ == "__main__":
create_masking_policies()
query_masked_data()
Performance Metrics
| Metric | Value | Description |
|---|---|---|
| Policy Evaluation Latency | < 1ms per column | Negligible impact on query performance |
| Storage Overhead | 0 bytes | No additional storage for masked data |
| Policy Cache Hit Rate | > 99% | Policies cached in memory for fast access |
| Concurrent Policy Evaluations | 10M+ per second | High throughput for enterprise workloads |
| Policy Deployment Time | < 1 second | Instant policy application |
| Query Impact | < 2% overhead | Minimal performance degradation |
Best Practices
-
Use Role-Based Masking: Always design masking policies around user roles rather than individual users for scalability and maintainability.
-
Implement Least Privilege: Start with the most restrictive masking and gradually grant exceptions based on business need.
-
Test Policy Impact: Measure query performance before and after applying masking policies to ensure they don't introduce unexpected latency.
-
Version Control Policies: Store masking policy definitions in Git and deploy through CI/CD pipelines for audit trail and rollback capability.
-
Monitor Policy Usage: Use
ACCESS_HISTORYto track which policies are being applied and identify potential abuse patterns. -
Combine with Row-Level Security: Masking policies work best when combined with row-level security policies for comprehensive data protection.
-
Use External Tokenization: For PCI-DSS compliance, consider using external tokenization services instead of built-in masking for payment card data.
-
Regular Policy Audits: Review masking policies quarterly to ensure they align with current compliance requirements and data classification standards.
Additional Theory: Compliance Frameworks and Masking
Dynamic data masking supports compliance with major regulations:
| Regulation | Requirement | Masking Strategy |
|---|---|---|
| GDPR | Right to erasure, data minimization | Full masking for PII fields |
| CCPA | Consumer data protection | Partial masking + access controls |
| HIPAA | Protected health information | Full masking + audit trails |
| PCI-DSS | Payment card data protection | Tokenization + encryption |
| SOX | Financial data integrity | Role-based masking + audit |
Data classification for masking:
- Public: No masking required
- Internal: Basic masking for external roles
- Confidential: Full masking for non-authorized roles
- Restricted: Tokenization/encryption + strict access control
Additional Theory: Masking Policy Evaluation Flow
Query arrives β Column accessed β Policy lookup β Role check β Transform β Return value
β β β
Policy metadata CASE expression Execute branch
(cached in memory) evaluates role for matching role
Performance characteristics:
- Policy evaluation: < 1ms per row (negligible overhead)
- Cache hit rate: > 99% (policies cached in cloud services)
- No storage overhead: Original data preserved; transformation is query-time only
- No data duplication: Single copy of data with role-based views
See Also
- 05-Access-Control-RBAC - Role-based access control
- 10-Data-Sharing - Securing shared data
- 14-Monitoring-Queries - Auditing data access patterns
- PySpark Iceberg - Data lake security patterns
- Delta Lake on Databricks - Delta Lake security model
- Data Warehouse Concepts - Data warehouse design principles