Security Best Practices
Architecture Diagram
Formal Definitions
DfRole-Based Access Control (RBAC)
RBAC is a security paradigm where access to resources is determined by the role assigned to a user. In Airflow, RBAC defines permissions where a role grants action on object .
DfSecrets Backend
A Secrets Backend is an external service (HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager) that stores and retrieves sensitive credentials. It replaces storing secrets in the metadata database with secure, auditable, and rotatable storage.
DfEncryption at Rest
Encryption at Rest ensures data is encrypted when stored. In Airflow, this applies to the metadata database, connection passwords, variables, and XCom data. The encryption follows where is the encryption key, is plaintext, and is ciphertext.
Detailed Explanation
RBAC Configuration
Role-Based Access Control restricts user permissions.
Default Roles:
| Role | Permissions |
|---|---|
| Admin | Full access to all resources |
| Op | Create/edit DAGs, manage connections/variables |
| User | Read-only access to DAGs and history |
| Viewer | Read-only access to DAGs |
Custom Roles:
ROLES = {
'DataEngineer': [
(permissions.ACTION_CAN_READ, permissions.RESOURCE_DAG),
(permissions.ACTION_CAN_EDIT, permissions.RESOURCE_DAG),
],
'DataAnalyst': [
(permissions.ACTION_CAN_READ, permissions.RESOURCE_DAG),
],
}
OAuth Integration: Google, GitHub, LDAP, and custom providers supported.
Secrets Backend Setup
Store credentials in external secret management systems.
Supported Backends:
| Backend | Provider | Configuration |
|---|---|---|
| Vault | HashiCorp | backend = airflow.providers.hashicorp.secrets.vault.VaultBackend |
| AWS SM | AWS | backend = airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend |
| GCP SM | Google Cloud | backend = airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend |
# airflow.cfg
[secrets]
backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
backend_kwargs = {
"connections_path": "airflow/connections",
"variables_path": "airflow/variables",
}
Note: Hooks automatically resolve connections from secrets backend.
Connection Encryption
Encrypt connections in transit and at rest.
Security Checklist:
| Practice | Implementation |
|---|---|
| SSL/TLS | sslmode: require in connection extras |
| Fernet | Enable AIRFLOW__CORE__FERNET_KEY for metadata encryption |
| HTTPS | Use HTTPS for web server |
| Network | Deploy in private VPC/subnet |
# Encrypted connection example
conn = Connection(
conn_id='production_db',
conn_type='postgres',
host='db.example.com',
extra='{"sslmode": "require", "sslrootcert": "/path/to/ca.pem"}',
)
'sslmode': 'require', # Enforce SSL 'sslcert': '/path/to/client-cert.pem', 'sslkey': '/path/to/client-key.pem', 'sslrootcert': '/path/to/ca-cert.pem', }
return conn_params
def verify_encryption(): """Verify connection is encrypted.""" hook = PostgresHook(postgres_conn_id='encrypted_db')
result = hook.get_first(""" SELECT pgssl.ssl_is_used as ssl_enabled, pgssl.ssl_version as ssl_version, pgssl.ssl_cipher as cipher FROM pg_stat_ssl pgssl JOIN pg_stat_activity pgact ON pgssl.pid = pgact.pid WHERE pgact.usename = current_user """)
return { 'ssl_enabled': result[0], 'ssl_version': result[1], 'cipher': result[2], }
<MathKeyFormula
title="RBAC Permission Check"
tex={`\\text{Access}(r, o, a) = \\begin{cases} \\text{granted} & \\text{if } (r, o, a) \\\\ & \\in P_{\\text{role}} \\\\ \\text{denied} & \\text{otherwise} \\end{cases}`}
variables={[
{ symbol: "r", description: "User role" },
{ symbol: "o", description: "Resource object" },
{ symbol: "a", description: "Action (read, write, delete)" },
{ symbol: "P_{\\text{role}}", description: "Set of permissions for role" }
]}
/>
<MathFormula
title="Secret Rotation Interval"
tex={`T_{\\text{rotation}} \\leq \\min(T_{\\text{policy}}, T_{\\text{compromise\\_risk}})`}
variables={[
{ symbol: "T_{\\text{rotation}}", description: "Time between secret rotations" },
{ symbol: "T_{\\text{policy}}", description: "Compliance policy requirement" },
{ symbol: "T_{\\text{compromise\_risk}}", description: "Risk-based rotation interval" }
]}
/>
<MathNote type="info">
Never store secrets in DAG files or environment variables in plaintext. Use Airflow's Secrets Backend (Vault, AWS Secrets Manager, GCP Secret Manager) for all credentials.
</MathNote>
<MathNote type="tip">
Enable audit logging to track who accessed what secrets and when. This is required for SOC2 and GDPR compliance.
</MathNote>
## Key Concepts Table
| Security Layer | Component | Implementation | Priority |
|----------------|-----------|----------------|----------|
| **Authentication** | Webserver | OAuth, LDAP, SAML | P0 |
| **Authorization** | RBAC | Role-based permissions | P0 |
| **Encryption** | Database | TLS, AES-256 | P0 |
| **Secrets** | Backend | Vault, AWS SM | P0 |
| **Network** | Infrastructure | Firewalls, VPN | P1 |
| **Audit** | Logging | Activity logs | P1 |
| **Compliance** | Policies | SOC2, GDPR | P2 |
## Code Examples
### Security Audit Script
```python
# security_audit.py
from airflow import settings
from airflow.models import Connection, Variable
from sqlalchemy import text
import json
def audit_security_posture():
"""Comprehensive security audit of Airflow deployment."""
session = settings.Session()
findings = []
# Check for plaintext passwords in metadata DB
plaintext_conns = session.query(Connection).filter(
Connection.conn_type.notin_(['aws', 'google_cloud_platform'])
).all()
for conn in plaintext_conns:
if conn.password and not conn.password.startswith('{'):
findings.append({
'severity': 'HIGH',
'category': 'Secrets',
'finding': f'Plaintext password in connection: {conn.conn_id}',
'recommendation': 'Move to secrets backend',
})
# Check for variables with sensitive data
sensitive_patterns = ['password', 'secret', 'key', 'token']
variables = session.query(Variable).all()
for var in variables:
if any(pattern in var.key.lower() for pattern in sensitive_patterns):
findings.append({
'severity': 'MEDIUM',
'category': 'Secrets',
'finding': f'Potentially sensitive variable: {var.key}',
'recommendation': 'Move to secrets backend',
})
# Check for DAGs with hardcoded secrets
import ast
import os
dags_folder = '/opt/airflow/dags'
for root, dirs, files in os.walk(dags_folder):
for file in files:
if file.endswith('.py'):
filepath = os.path.join(root, file)
with open(filepath, 'r') as f:
content = f.read()
# Simple pattern matching
for pattern in ['password=', 'secret=', 'api_key=']:
if pattern in content:
findings.append({
'severity': 'HIGH',
'category': 'Code',
'finding': f'Potential hardcoded secret in {filepath}',
'recommendation': 'Use variables or secrets backend',
})
return findings
def generate_security_report(findings):
"""Generate security audit report."""
report = {
'total_findings': len(findings),
'high_severity': len([f for f in findings if f['severity'] == 'HIGH']),
'medium_severity': len([f for f in findings if f['severity'] == 'MEDIUM']),
'low_severity': len([f for f in findings if f['severity'] == 'LOW']),
'findings': findings,
}
print(json.dumps(report, indent=2))
return report
if __name__ == "__main__":
findings = audit_security_posture()
generate_security_report(findings)
Network Security Configuration
# docker-compose-security.yml
version: '3.8'
services:
airflow-webserver:
image: apache/airflow:2.8.0
command: webserver
networks:
- airflow-internal
environment:
- AIRFLOW__CORE__FERNET_KEY=${FERNET_KEY}
- AIRFLOW__WEBSERVER__EXPOSE_CONFIG=False
deploy:
resources:
limits:
cpus: '1'
memory: 2G
# Security context
security_opt:
- no-new-privileges:true
read_only: true
tmpfs:
- /tmp
airflow-scheduler:
image: apache/airflow:2.8.0
command: scheduler
networks:
- airflow-internal
environment:
- AIRFLOW__CORE__FERNET_KEY=${FERNET_KEY}
security_opt:
- no-new-privileges:true
postgres:
image: postgres:15
networks:
- airflow-internal
environment:
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
secrets:
- db_password
volumes:
- postgres_data:/var/lib/postgresql/data
# Disable network access except from Airflow
# Use internal network only
vault:
image: hashicorp/vault:1.15
networks:
- airflow-internal
cap_add:
- IPC_LOCK
environment:
VAULT_ADDR: 'http://0.0.0.0:8200'
volumes:
- vault_data:/vault/file
command: server -dev
networks:
airflow-internal:
driver: bridge
internal: true # No external access
secrets:
db_password:
file: ./secrets/db_password.txt
volumes:
postgres_data:
vault_data:
Performance Metrics
Security Posture Score
| Metric | Score | Weight | Status |
|---|---|---|---|
| Secrets in Backend | 100% | 30% | PASS |
| TLS Enabled | 100% | 25% | PASS |
| RBAC Configured | 80% | 20% | PASS |
| Audit Logging | 90% | 15% | PASS |
| Network Segmentation | 70% | 10% | WARNING |
Compliance Checklist
| Requirement | Status | Evidence | Owner |
|---|---|---|---|
| SOC2 - Access Controls | PASS | RBAC configured | Security |
| SOC2 - Encryption | PASS | TLS 1.3 enabled | Platform |
| GDPR - Data Protection | PASS | Encryption at rest | Platform |
| GDPR - Audit Trail | PASS | Logging enabled | Security |
| HIPAA - PHI Handling | N/A | No PHI processed | - |
Key Takeaways:
- Use OAuth/LDAP/SAML for authentication; never use default credentials
- Implement RBAC with least-privilege principles
- Store all secrets in a Secrets Backend (Vault, AWS SM, GCP SM)
- Enable TLS for all connections; encrypt data at rest
- Enable audit logging for compliance (SOC2, GDPR)
- Segment networks; use internal networks for Airflow components
- Rotate secrets regularly; automate rotation where possible
See Also
- Connection Management β Secure connection configuration
- Variable Management β Secure variable storage
- Error Handling β Secure error logging
- Monitoring and Alerting β Security event monitoring