Interview Question
βΉοΈInterview Context
Company: Microsoft / Amazon Role: Senior Data Engineer / Security Engineer Difficulty: Advanced Time: 45-60 minutes
Question: "How do you secure an Airflow deployment? Explain RBAC, audit logging, encryption, and compliance considerations. What security best practices would you implement?"
Detailed Theory
Security Fundamentals
# security_fundamentals.py
"""
Airflow Security Layers:
1. Authentication:
- LDAP/OAuth integration
- Multi-factor authentication
- Service accounts
2. Authorization (RBAC):
- Role-based access control
- DAG-level permissions
- Task-level permissions
3. Encryption:
- Data at rest
- Data in transit
- Secret management
4. Audit Logging:
- Action tracking
- Compliance logging
- Security monitoring
5. Network Security:
- Network policies
- TLS/SSL
- Firewall rules
"""
1. RBAC Configuration
# rbac_configuration.py
"""
Role-Based Access Control (RBAC):
Control who can do what in Airflow.
"""
# RBAC Configuration
RBAC_CONFIG = """
[webserver]
# Enable RBAC
rbac = True
# Expose the Flask-AppBuilder Admin Interface
expose_config = True
# Allow read-only views for all authenticated users
allow_read_on_all_dags = False
# Allow all for Admin
allow_all_dags = True
"""
# Custom roles
CUSTOM_ROLES = """
from airflow.www.security import AirflowSecurityManager
class CustomSecurityManager(AirflowSecurityManager):
def create_custom_roles(self):
# Data Engineer role
self.add_role(
name='DataEngineer',
permissions=[
('can_read', 'Dag'),
('can_write', 'Dag'),
('can_read', 'Connection'),
('can_write', 'Connection'),
('can_read', 'Variable'),
('menu_access', 'DAGs'),
('menu_access', 'Connections'),
('menu_access', 'Variables'),
]
)
# Viewer role
self.add_role(
name='Viewer',
permissions=[
('can_read', 'Dag'),
('menu_access', 'DAGs'),
]
)
# Admin role
self.add_role(
name='Admin',
permissions=[
('can_read', 'Dag'),
('can_write', 'Dag'),
('can_read', 'Connection'),
('can_write', 'Connection'),
('can_read', 'Variable'),
('can_write', 'Variable'),
('menu_access', 'Admin'),
('menu_access', 'DAGs'),
('menu_access', 'Connections'),
('menu_access', 'Variables'),
]
)
"""
# DAG-level permissions
DAG_PERMISSIONS = """
# Airflow 2.x DAG-level permissions
# Automatically created when DAGs are parsed
# Permission format:
# can_read:<dag_id>
# can_edit:<dag_id>
# can_delete:<dag_id>
# Example:
# can_read:my_dag
# can_edit:my_dag
"""
βΉοΈPro Tip
Use the principle of least privilege. Give users only the permissions they need to perform their job functions.
2. Audit Logging
# audit_logging.py
"""
Audit Logging:
Track all actions in Airflow for compliance and security.
"""
# Audit log configuration
AUDIT_CONFIG = """
[logging]
# Enable audit logging
enable_audit_logging = True
# Log filename
audit_log_filename = /var/log/airflow/audit.log
# Log format
audit_log_format = %%(asctime)s %%(user)s %%(action)s %%(resource)s %%(details)s
"""
# Custom audit logging
from airflow.utils.log.logging_mixin import LoggingMixin
from datetime import datetime
import json
class AuditLogger(LoggingMixin):
"""Custom audit logger"""
def __init__(self):
super().__init__()
self.audit_log = []
def log_action(
self,
user: str,
action: str,
resource: str,
details: dict = None,
):
"""Log an audit action"""
entry = {
'timestamp': datetime.now().isoformat(),
'user': user,
'action': action,
'resource': resource,
'details': details or {},
}
self.audit_log.append(entry)
# Log to file
self.log.info(
f"AUDIT: {user} {action} {resource} {json.dumps(details)}"
)
# Send to external system
self._send_to_siem(entry)
def _send_to_siem(self, entry: dict):
"""Send audit log to SIEM system"""
# Integration with SIEM (Splunk, ELK, etc.)
pass
# Usage
audit_logger = AuditLogger()
# Log user login
audit_logger.log_action(
user='john@example.com',
action='LOGIN',
resource='webserver',
)
# Log DAG trigger
audit_logger.log_action(
user='john@example.com',
action='TRIGGER',
resource='dag:my_dag',
details={'run_id': 'manual__2024-01-01'},
)
# Log connection access
audit_logger.log_action(
user='john@example.com',
action='READ',
resource='connection:my_database',
)
3. Encryption
# encryption.py
"""
Encryption:
Protect data at rest and in transit.
"""
# Encryption configuration
ENCRYPTION_CONFIG = """
[core]
# Fernet key for encryption
fernet_key = {{ secrets.FERNET_KEY }}
# Enable encryption for connections
encrypt_connections = True
[webserver]
# Enable HTTPS
web_server_ssl_cert = /etc/ssl/certs/airflow.crt
web_server_ssl_key = /etc/ssl/private/airflow.key
# Secure cookies
secure_cookies = True
session_cookie_secure = True
session_cookie_http_only = True
"""
# Generate Fernet key
def generate_fernet_key():
"""Generate a new Fernet key"""
from cryptography.fernet import Fernet
return Fernet.generate_key().decode()
# Encryption utilities
from cryptography.fernet import Fernet
from airflow.configuration import conf
class EncryptionUtils:
"""Encryption utilities for Airflow"""
def __init__(self):
self.fernet_key = conf.get('core', 'fernet_key')
self.fernet = Fernet(self.fernet_key.encode())
def encrypt(self, data: str) -> str:
"""Encrypt string data"""
return self.fernet.encrypt(data.encode()).decode()
def decrypt(self, encrypted_data: str) -> str:
"""Decrypt string data"""
return self.fernet.decrypt(encrypted_data.encode()).decode()
def encrypt_connection(self, connection_dict: dict) -> dict:
"""Encrypt connection password"""
if 'password' in connection_dict:
connection_dict['password'] = self.encrypt(
connection_dict['password']
)
return connection_dict
def decrypt_connection(self, connection_dict: dict) -> dict:
"""Decrypt connection password"""
if 'password' in connection_dict:
connection_dict['password'] = self.decrypt(
connection_dict['password']
)
return connection_dict
# Usage
encryption_utils = EncryptionUtils()
# Encrypt sensitive data
encrypted = encryption_utils.encrypt('sensitive_data')
decrypted = encryption_utils.decrypt(encrypted)
4. Network Security
# network_security.py
"""
Network Security:
Protect Airflow components with network policies.
"""
# Network policies (Kubernetes)
NETWORK_POLICIES = """
# Restrict Airflow component communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: airflow-network-policy
namespace: airflow
spec:
podSelector:
matchLabels:
app: airflow
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: airflow-webserver
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: airflow-metadata-db
ports:
- protocol: TCP
port: 5432
"""
# TLS configuration
TLS_CONFIG = """
# Nginx reverse proxy with TLS
server {
listen 443 ssl;
server_name airflow.example.com;
ssl_certificate /etc/ssl/certs/airflow.crt;
ssl_certificate_key /etc/ssl/private/airflow.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
location / {
proxy_pass http://airflow-webserver:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
"""
β οΈImportant
Always use TLS for production deployments. Never transmit credentials or sensitive data over unencrypted connections.
Real-World Scenarios
Scenario 1: Microsoft's Enterprise Security
# microsoft_security.py
"""
Microsoft-style enterprise security:
- Azure AD integration
- Conditional access policies
- Compliance logging
"""
# Azure AD integration
AZURE_AD_CONFIG = """
[webserver]
# Azure AD authentication
auth_type = AUTH_OAUTH
# OAuth settings
oauth_provider = azure
oauth_client_id = {{ secrets.AZURE_CLIENT_ID }}
oauth_client_secret = {{ secrets.AZURE_CLIENT_SECRET }}
oauth_tenant_id = {{ secrets.AZURE_TENANT_ID }}
oauth_redirect_uri = https://airflow.example.com/oauth/callback
# Azure AD roles mapping
role_mapping = {
'DataEngineer': ['Data Engineer Group'],
'Viewer': ['Viewer Group'],
'Admin': ['Airflow Admins Group'],
}
"""
# Conditional access
CONDITIONAL_ACCESS = """
# Implement conditional access policies
1. Require MFA for admin access
2. Limit access by IP range
3. Enforce session timeouts
4. Block risky sign-ins
"""
# Compliance logging
COMPLIANCE_LOGGING = """
# Log all access for compliance
1. User login/logout
2. DAG access
3. Connection access
4. Variable access
5. Configuration changes
"""
Scenario 2: Amazon's Security Practices
# amazon_security.py
"""
Amazon-style security practices:
- IAM integration
- VPC configuration
- Secrets rotation
"""
# IAM integration
IAM_CONFIG = """
# Use IAM roles for AWS services
# ECS task role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
# Attach policies for S3, RDS, etc.
"""
# VPC configuration
VPC_CONFIG = """
# Deploy Airflow in private subnet
1. Webserver in public subnet (with ALB)
2. Workers in private subnet
3. Metadata DB in private subnet
4. Use NAT Gateway for outbound access
"""
# Secrets rotation
SECRETS_ROTATION = """
# Implement automatic secrets rotation
1. Use AWS Secrets Manager
2. Rotate database credentials
3. Rotate API keys
4. Update Airflow connections
"""
QuizBox
Best Practices
# best_practices.py
"""
Security Best Practices:
1. Authentication:
- Use enterprise authentication (LDAP/OAuth)
- Enable multi-factor authentication
- Use service accounts for automation
2. Authorization:
- Implement RBAC
- Use least privilege principle
- Regular access reviews
3. Encryption:
- Encrypt data at rest
- Use TLS for data in transit
- Rotate encryption keys
4. Audit Logging:
- Log all actions
- Monitor for suspicious activity
- Retain logs for compliance
5. Network Security:
- Use network policies
- Deploy in private subnets
- Implement firewall rules
"""
βΉοΈMicrosoft Interview Tip
At Microsoft, they emphasize defense in depth. When discussing security, highlight multiple layers of protection, compliance requirements, and security monitoring. Also mention how they handle incident response and security audits.
Summary
Security is critical for production Airflow deployments. Key takeaways:
- RBAC for access control
- Audit logging for compliance
- Encryption for data protection
- Network security for isolation
- Secrets management for credentials
For Microsoft and Amazon interviews, focus on:
- Enterprise authentication
- Compliance requirements
- Security monitoring
- Incident response
- Defense in depth
This question is part of the Apache Airflow Advanced interview preparation series. Practice explaining these concepts before your interview.