Infrastructure as Code for Data Platforms
Automating data platform provisioning and management
Interview Question
"Design an Infrastructure as Code solution for a data platform that: (1) provisions Snowflake, Kafka, and Spark clusters, (2) manages environment separation (dev/staging/prod), (3) implements RBAC, (4) handles secrets management, (5) includes CI/CD pipeline. Include Terraform code and best practices."
Difficulty: Hard | Frequently asked at HashiCorp, Netflix, Uber, Datadog
Theoretical Foundation
What is Infrastructure as Code (IaC)?
IaC is the practice of managing infrastructure through code rather than manual processes.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Infrastructure as Code β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Traditional: β
β - Manual provisioning β
β - Configuration drift β
β - Inconsistent environments β
β - Slow disaster recovery β
β β
β IaC: β
β - Automated provisioning β
β - Version controlled β
β - Consistent environments β
β - Fast disaster recovery β
β β
β IaC Tools: β
β - Terraform (HashiCorp) β
β - CloudFormation (AWS) β
β - Pulumi (Multi-cloud) β
β - Ansible (Configuration management) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Terraform Architecture
Terraform State Management
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β State Management β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Local State: β
β - Stored on local machine β
β - Not shared β
β - Risk of loss β
β β
β Remote State: β
β - Stored in cloud storage (S3, GCS) β
β - Shared across team β
β - State locking (DynamoDB, GCS) β
β - Versioning β
β β
β State Locking: β
β - Prevents concurrent modifications β
β - Uses DynamoDB (AWS) or GCS (GCP) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Environment Separation
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Environment Separation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Approach 1: Separate State Files β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β dev/ β β
β β βββ main.tf β β
β β βββ variables.tf β β
β β βββ terraform.tfstate β β
β β staging/ β β
β β βββ main.tf β β
β β βββ variables.tf β β
β β βββ terraform.tfstate β β
β β prod/ β β
β β βββ main.tf β β
β β βββ variables.tf β β
β β βββ terraform.tfstate β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Approach 2: Workspaces β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β terraform workspace new dev β β
β β terraform workspace new staging β β
β β terraform workspace new prod β β
β β β β
β β terraform workspace select dev β β
β β terraform apply β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Code Implementation
Terraform Project Structure
data-platform/
βββ environments/
β βββ dev/
β β βββ main.tf
β β βββ variables.tf
β β βββ outputs.tf
β β βββ terraform.tfvars
β βββ staging/
β β βββ main.tf
β β βββ variables.tf
β β βββ outputs.tf
β β βββ terraform.tfvars
β βββ prod/
β βββ main.tf
β βββ variables.tf
β βββ outputs.tf
β βββ terraform.tfvars
βββ modules/
β βββ snowflake/
β β βββ main.tf
β β βββ variables.tf
β β βββ outputs.tf
β βββ kafka/
β β βββ main.tf
β β βββ variables.tf
β β βββ outputs.tf
β βββ spark/
β β βββ main.tf
β β βββ variables.tf
β β βββ outputs.tf
β βββ networking/
β βββ main.tf
β βββ variables.tf
β βββ outputs.tf
βββ .gitignore
βββ README.md
Provider Configuration
# environments/prod/providers.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
snowflake = {
source = "Snowflake-Labs/snowflake"
version = "~> 0.89"
}
kafka = {
source = "Mongey/kafka"
version = "~> 0.6"
}
}
# Remote state backend
backend "s3" {
bucket = "company-terraform-state"
key = "data-platform/prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
# AWS Provider
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
Team = "data-engineering"
ManagedBy = "terraform"
}
}
}
# Snowflake Provider
provider "snowflake" {
role = "ACCOUNTADMIN"
account = var.snowflake_account
user = var.snowflake_user
authenticator = "SNOWFLAKE_JWT"
private_key = var.snowflake_private_key
}
Snowflake Module
# modules/snowflake/main.tf
# Snowflake Warehouse
resource "snowflake_warehouse" "analytics" {
name = "ANALYTICS_WH"
comment = "Analytics warehouse"
warehouse_size = "medium"
auto_suspend = 60
auto_resume = true
min_cluster_count = 1
max_cluster_count = 5
scaling_policy = "ECONOMY"
}
# Snowflake Database
resource "snowflake_database" "analytics" {
name = "ANALYTICS_DB"
comment = "Analytics database"
}
# Snowflake Schema
resource "snowflake_schema" "staging" {
database = snowflake_database.analytics.name
name = "STAGING"
comment = "Staging schema"
}
resource "snowflake_schema" "marts" {
database = snowflake_database.analytics.name
name = "MARTS"
comment = "Marts schema"
}
# Snowflake Role
resource "snowflake_role" "analyst" {
name = "ANALYST"
comment = "Analyst role"
}
# Grant privileges
resource "snowflake_grant_privileges_to_account_role" "analyst_warehouse" {
account_role_name = snowflake_role.analyst.name
privileges = ["USAGE"]
on_account_object {
object_type = "WAREHOUSE"
object_name = snowflake_warehouse.analytics.name
}
}
resource "snowflake_grant_privileges_to_account_role" "analyst_database" {
account_role_name = snowflake_role.analyst.name
privileges = ["USAGE"]
on_account_object {
object_type = "DATABASE"
object_name = snowflake_database.analytics.name
}
}
resource "snowflake_grant_privileges_to_account_role" "analyst_schemas" {
account_role_name = snowflake_role.analyst.name
privileges = ["USAGE"]
on_schema {
schema_name = "${snowflake_database.analytics.name}.${snowflake_schema.staging.name}"
}
}
resource "snowflake_grant_privileges_to_account_role" "analyst_tables" {
account_role_name = snowflake_role.analyst.name
privileges = ["SELECT"]
on_schema_object {
object_type = "TABLE"
object_name = "${snowflake_database.analytics.name}.${snowflake_schema.staging.name}.*"
}
}
Kafka Module
# modules/kafka/main.tf
# MSK Cluster
resource "aws_msk_cluster" "kafka" {
cluster_name = "data-platform-kafka"
kafka_version = "3.5.1"
number_of_broker_nodes = 3
broker_node_group_info {
instance_type = "kafka.m5.large"
client_subnets = var.subnet_ids
security_groups = [aws_security_group.kafka.id]
storage_info {
ebs_storage_info {
volume_size = 1000
}
}
}
encryption_info {
encryption_in_transit {
client_broker = "TLS"
in_cluster = true
}
encryption_at_rest_kms_key_arn = aws_kms_key.kafka.arn
}
configuration_info {
arn = aws_msk_configuration.kafka.arn
revision = aws_msk_configuration.kafka.latest_revision
}
tags = {
Name = "data-platform-kafka"
}
}
# MSK Configuration
resource "aws_msk_configuration" "kafka" {
name = "data-platform-config"
kafka_versions = ["3.5.1"]
server_properties = <<PROPERTIES
auto.create.topics.enable=true
delete.topic.enable=true
num.partitions=100
default.replication.factor=3
min.insync.replicas=2
log.retention.hours=168
PROPERTIES
}
# Security Group
resource "aws_security_group" "kafka" {
name = "kafka-security-group"
description = "Security group for Kafka cluster"
ingress {
from_port = 9092
to_port = 9092
protocol = "tcp"
cidr_blocks = [var.vpc_cidr]
}
ingress {
from_port = 9094
to_port = 9094
protocol = "tcp"
cidr_blocks = [var.vpc_cidr]
}
}
# KMS Key for encryption
resource "aws_kms_key" "kafka" {
description = "KMS key for Kafka encryption"
}
Spark Module
# modules/spark/main.tf
# EMR Cluster
resource "aws_emr_cluster" "spark" {
name = "data-platform-spark"
release_label = "emr-6.15.0"
applications = ["Spark", "Hive", "JupyterEnterpriseGateway"]
service_role = aws_iam_role.emr_service.arn
master_instance_group {
instance_type = "m5.xlarge"
instance_count = 1
}
core_instance_group {
instance_type = "m5.2xlarge"
instance_count = 3
autoscaling_policy {
constraints {
min_capacity = 3
max_capacity = 10
}
rule {
metric_type = "YARN_AVAILABLE_MEMORY_PERCENTAGE"
comparison_operator = "LESS_THAN"
scaling_adjustment = 1
cool_down_duration = 300
}
}
}
ec2_attributes {
key_name = var.ssh_key_name
subnet_id = var.subnet_id
emr_managed_master_security_group = aws_security_group.emr_master.id
emr_managed_slave_security_group = aws_security_group.emr_slave.id
}
tags = {
Name = "data-platform-spark"
}
}
# S3 Bucket for Spark logs
resource "aws_s3_bucket" "spark_logs" {
bucket = "data-platform-spark-logs-${var.environment}"
}
# IAM Role for EMR
resource "aws_iam_role" "emr_service" {
name = "emr-service-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "emr.amazonaws.com"
}
}
]
})
}
Secrets Management
# modules/secrets/main.tf
# AWS Secrets Manager
resource "aws_secretsmanager_secret" "snowflake" {
name = "data-platform/snowflake"
}
resource "aws_secretsmanager_secret_version" "snowflake" {
secret_id = aws_secretsmanager_secret.snowflake.id
secret_string = jsonencode({
account = var.snowflake_account
user = var.snowflake_user
password = var.snowflake_password
})
}
# HashiCorp Vault (alternative)
# resource "vault_generic_secret" "snowflake" {
# path = "secret/data-platform/snowflake"
#
# data_json = jsonencode({
# account = var.snowflake_account
# user = var.snowflake_user
# password = var.snowflake_password
# })
# }
CI/CD Pipeline
# .github/workflows/terraform.yml
name: Terraform
on:
push:
branches:
- main
pull_request:
branches:
- main
env:
TF_VERSION: "1.6.0"
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Terraform Init
run: terraform init
working-directory: environments/${{ github.event.inputs.environment }}
- name: Terraform Plan
run: terraform plan -out=tfplan
working-directory: environments/${{ github.event.inputs.environment }}
- name: Terraform Apply
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: terraform apply -auto-approve tfplan
working-directory: environments/${{ github.event.inputs.environment }}
Usage
# Initialize Terraform
terraform init
# Plan changes
terraform plan -var-file="prod.tfvars"
# Apply changes
terraform apply -var-file="prod.tfvars"
# Destroy infrastructure
terraform destroy -var-file="prod.tfvars"
π‘
Production Tip: Always use remote state with state locking. Store state in encrypted S3 with versioning. Use separate state files for each environment. Never commit state files to Git.
Common Follow-Up Questions
Q1: How do you handle secrets in Terraform?
# Use variables for secrets
variable "snowflake_password" {
type = string
sensitive = true
}
# Use AWS Secrets Manager
data "aws_secretsmanager_secret_version" "snowflake" {
secret_id = "data-platform/snowflake"
}
# Use environment variables
export TF_VAR_snowflake_password="mysecretpassword"
Q2: How do you manage Terraform state?
# Remote state with S3 backend
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "data-platform/prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
Q3: How do you handle Terraform modules?
# Use modules for reusable components
module "snowflake" {
source = "../../modules/snowflake"
environment = var.environment
account = var.snowflake_account
}
Q4: How do you implement Terraform testing?
# Use Terratest for integration tests
func TestTerraform(t *testing.T) {
terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
TerraformDir: "../examples/simple",
})
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
// Assertions
output := terraform.Output(t, terraformOptions, "snowflake_warehouse")
assert.Equal(t, "ANALYTICS_WH", output)
}
β οΈ
Critical Consideration: Never commit secrets to Git. Use environment variables, AWS Secrets Manager, or HashiCorp Vault. Always encrypt state files and use state locking.
Company-Specific Tips
HashiCorp Interview Tips
- Discuss Terraform best practices
- Explain state management strategies
- Mention modules and workspaces
- Talk about Terraform Cloud features
Netflix Interview Tips
- Focus on multi-cloud Terraform
- Explain environment separation strategies
- Mention secrets management at scale
- Talk about CI/CD for infrastructure
Uber Interview Tips
- Discuss Terraform for Kubernetes
- Explain Helm charts for applications
- Mention ArgoCD for GitOps
- Talk about infrastructure testing
βΉοΈ
Final Takeaway: Infrastructure as Code is essential for managing modern data platforms. Use Terraform for provisioning, modules for reusability, and remote state for collaboration. Always implement proper secrets management, testing, and CI/CD.