Security Architecture
Customer-Managed Encryption Keys (CMEK)
from google.cloud import kms_v1
client = kms_v1.KeyManagementServiceClient()
# Create key ring
key_ring = client.create_key_ring(
request={
"parent": "projects/my-project/locations/us-central1",
"key_ring_id": "data-engineering-keyring",
}
)
# Create encryption key
key = client.create_crypto_key(
request={
"parent": key_ring.name,
"crypto_key_id": "bigquery-encryption-key",
"purpose": "ENCRYPT_DECRYPT",
"version_template": {
"algorithm": "GOOGLE_SYMMETRIC_ENCRYPTION",
"protection_level": kms_v1.CryptoKeyVersion.ProtectionLevel.HSM
}
}
)
# Enable automatic rotation
key.rotation_period = {"seconds": 7776000} # 90 days
client.update_crypto_key(request={"crypto_key": key})
VPC Service Controls
from google.cloud import accesscontextmanager_v1
client = accesscontextmanager_v1.AccessContextManagerClient()
# Create access policy
policy = client.create_access_policy(
request={
"parent": "organizations/123456789",
"title": "Data Engineering Access Policy",
"scopes": ["projects/my-project"]
}
)
# Create service perimeter
perimeter = client.create_service_perimeter(
request={
"parent": policy.name,
"service_perimeter_id": "data-engineering-perimeter",
"service_perimeter": {
"title": "Data Engineering Perimeter",
"status": {
"resources": ["projects/my-project"],
"restricted_services": [
"bigquery.googleapis.com",
"storage.googleapis.com",
"dataflow.googleapis.com"
],
"vpc_accessible_services": {
"enable_restriction": True,
"allowed_services": ["bigquery.googleapis.com"]
}
}
}
}
)
β¨
Best Practice: Use CMEK for all sensitive data in BigQuery and GCS. Implement VPC Service Controls to prevent data exfiltration. Enable audit logging for data access. Use Workload Identity Federation instead of service account keys. Review security configurations quarterly.
Common Interview Questions
Q1: What is the difference between Google-managed and CMEK?
Answer: Google-managed keys are automatically rotated and managed by Google. CMEK (Customer-Managed Encryption Keys) are managed by the customer via Cloud KMS, providing control over key rotation, destruction, and access policies. CMEK is required for compliance with certain regulations.
Q2: What are VPC Service Controls?
Answer: VPC Service Controls create security perimeters around GCP resources to prevent data exfiltration. They restrict which services can be accessed from within the perimeter and control egress to external services. Essential for protecting sensitive data in data lakes.
Q3: What types of audit logs should be enabled?
Answer: 1) Admin Activity logs (always on), 2) Data Access logs (BigQuery, GCS), 3) System Event logs, 4) Policy Denied logs. Data Access logs are critical for compliance but incur costs. Enable them for sensitive datasets.
Q4: How do you secure a data pipeline?
Answer: 1) Use service accounts with minimal permissions, 2) Enable CMEK for encryption, 3) Implement VPC Service Controls, 4) Use Private Google Access, 5) Enable audit logging, 6) Implement data masking for non-production, 7) Monitor for anomalies.
Q5: What is Workload Identity Federation?
Answer: Workload Identity Federation allows external identity providers (AWS, Azure, GitHub) to access GCP resources using OIDC tokens instead of service account keys. It eliminates long-lived credentials and is recommended for CI/CD and multi-cloud scenarios.