πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Azure Active Directory, IAM & Managed Identities

Azure Data EngineeringAzure AD & IAM⭐ Premium

Advertisement

Azure AD, IAM & Managed Identities

Mastering identity management and access control for secure data engineering pipelines

Identity Architecture for Data Engineering

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    IDENTITY & ACCESS MANAGEMENT                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  IDENTITY PROVIDERS         AUTHENTICATION         AUTHORIZATION   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Azure AD     │──────────│ OAuth 2.0    │─────>β”‚   RBAC      β”‚ β”‚
β”‚  β”‚ (Entra ID)   β”‚          β”‚ OpenID Conn  β”‚      β”‚   Roles     β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚         β”‚                        β”‚                       β”‚         β”‚
β”‚         β–Ό                        β–Ό                       β–Ό         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Managed      β”‚          β”‚ Token        β”‚      β”‚ Data         β”‚ β”‚
β”‚  β”‚ Identities   β”‚          β”‚ Exchange     β”‚      β”‚ Factory      β”‚ β”‚
β”‚  β”‚              β”‚          β”‚ Service      β”‚      β”‚ RBAC         β”‚ β”‚
β”‚  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚  β”‚ β”‚ System   β”‚β”‚                                                  β”‚
β”‚  β”‚ β”‚ Assigned β”‚β”‚          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚          β”‚ Conditional  β”‚      β”‚ Synapse      β”‚ β”‚
β”‚  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚          β”‚ Access       β”‚      β”‚ RBAC         β”‚ β”‚
β”‚  β”‚ β”‚ User     β”‚β”‚          β”‚ Policies     β”‚      β”‚              β”‚ β”‚
β”‚  β”‚ β”‚ Assigned β”‚β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚                                                  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚                            β”‚ Key Vault    β”‚      β”‚ Storage      β”‚ β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚ Access       β”‚      β”‚ RBAC         β”‚ β”‚
β”‚  β”‚ Service      │──────────│ Policies     β”‚      β”‚              β”‚ β”‚
β”‚  β”‚ Principals   β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                  β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Managed Identities Deep Dive

System-Assigned vs User-Assigned

FeatureSystem-AssignedUser-Assigned
LifecycleTied to resourceIndependent
SharingSingle resourceMultiple resources
CleanupAuto-deletedManual cleanup
Use CaseSingle-service authMulti-service scenarios
Maximum1 per resourceUnlimited

Managed Identity Configuration for Data Engineering

# Python: Using Managed Identity with Azure SDKs
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
from azure.synapse.artifacts import ArtifactsClient

# DefaultAzureCredential tries multiple auth methods automatically
credential = DefaultAzureCredential()

# ADLS Gen2 access with Managed Identity
datalake_client = DataLakeServiceClient(
    account_url="https://stdatalake001.dfs.core.windows.net",
    credential=credential
)

# List files in data lake
file_system_client = datalake_client.get_file_system_client("raw")
paths = list(file_system_client.list_paths(path="2024/01/"))
for path in paths:
    print(f"Path: {path.name}, Size: {path.size}")

# Synapse Artifacts access
artifacts_client = ArtifactsClient(
    credential=credential,
    endpoint="https://syn-workspace.dev.azuresynapse.net"
)

# List pipelines
pipelines = artifacts_client.pipeline.get_pipeline_by_name("etl_pipeline")

Service Principal Configuration

{
  "appId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "displayName": "sp-dataengineering-prod",
  "password": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "tenant": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
# Service Principal Authentication
from azure.identity import ClientSecretCredential

credential = ClientSecretCredential(
    tenant_id="your-tenant-id",
    client_id="sp-dataengineering-prod",
    client_secret="your-client-secret"
)

# Use with Azure Storage
from azure.storage.filedatalake import DataLakeServiceClient

client = DataLakeServiceClient(
    account_url="https://stdatalake001.dfs.core.windows.net",
    credential=credential
)

RBAC Roles for Data Engineering

Built-in Roles

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    RBAC ROLE HIERARCHY                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  MANAGEMENT GROUP                                               β”‚
β”‚  └── Subscription                                               β”‚
β”‚      └── Resource Group                                         β”‚
β”‚          └── Resource                                           β”‚
β”‚                                                                 β”‚
β”‚  SCOPE LEVELS (top to bottom):                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ MG > Sub > RG > Resource                                β”‚   β”‚
β”‚  β”‚                                                          β”‚   β”‚
β”‚  β”‚ Roles assigned at higher scope inherit downward          β”‚   β”‚
β”‚  β”‚ More specific scope overrides inherited role             β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚  DATA ENGINEERING ROLES:                                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Storage Blob Data Contributor   - ADLS read/write       β”‚   β”‚
β”‚  β”‚ Storage Blob Data Reader        - ADLS read-only        β”‚   β”‚
β”‚  β”‚ Synapse Administrator           - Full Synapse access   β”‚   β”‚
β”‚  β”‚ Synapse SQL Administrator       - SQL pool management   β”‚   β”‚
β”‚  β”‚ Key Vault Secrets User          - Read secrets          β”‚   β”‚
β”‚  β”‚ Data Factory Contributor        - ADF management        β”‚   β”‚
β”‚  β”‚ Contributor                     - General management    β”‚   β”‚
β”‚  β”‚ Reader                          - View-only access      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Custom Role for Data Engineers

{
  "Name": "Data Engineer Custom Role",
  "Description": "Custom role for data engineering operations",
  "AssignableScopes": [
    "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
  ],
  "Actions": [
    "Microsoft.Storage/storageAccounts/read",
    "Microsoft.Storage/storageAccounts/write",
    "Microsoft.Storage/storageAccounts/blobServices/containers/read",
    "Microsoft.Storage/storageAccounts/blobServices/containers/write",
    "Microsoft.Synapse/workspaces/read",
    "Microsoft.Synapse/workspaces/sqlPools/read",
    "Microsoft.Synapse/workspaces/sqlPools/write",
    "Microsoft.Synapse/workspaces/notebooks/read",
    "Microsoft.Synapse/workspaces/notebooks/write",
    "Microsoft.DataFactory/pipelines/read",
    "Microsoft.DataFactory/pipelines/write",
    "Microsoft.DataFactory/factories/read",
    "Microsoft.DataFactory/factories/write",
    "Microsoft.KeyVault/vaults/secrets/read"
  ],
  "NotActions": [
    "Microsoft.Authorization/*/Delete",
    "Microsoft.Authorization/*/Write",
    "Microsoft.Authorization/elevateAccess/Action"
  ]
}

⚠️

Security Critical: Never store Service Principal secrets in code, environment variables, or configuration files. Always use Azure Key Vault with Managed Identities for secret retrieval.

Azure AD Authentication Flow for Data Services

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    AUTHENTICATION FLOW                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  1. REQUEST TOKEN                                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚  β”‚ ADF/     │────>β”‚ Azure AD │────>β”‚ Token    β”‚                  β”‚
β”‚  β”‚ Databricksβ”‚     β”‚ Endpoint β”‚     β”‚ Service  β”‚                  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚                                         β”‚                          β”‚
β”‚  2. VALIDATE & ISSUE                     β”‚                          β”‚
β”‚                                         β–Ό                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ Azure AD validates:                                       β”‚    β”‚
β”‚  β”‚ β€’ Service Principal exists                                β”‚    β”‚
β”‚  β”‚ β€’ Credentials are valid                                   β”‚    β”‚
β”‚  β”‚ β€’ SP has required permissions                            β”‚    β”‚
β”‚  β”‚ β€’ Conditional Access policies pass                       β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                         β”‚                          β”‚
β”‚  3. RECEIVE TOKEN                        β–Ό                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚  β”‚ ADF/     β”‚<────│ JWT      β”‚<────│ Response β”‚                  β”‚
β”‚  β”‚ Databricksβ”‚     β”‚ Token    β”‚     β”‚          β”‚                  β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚       β”‚                                                           β”‚
β”‚  4. ACCESS RESOURCE                                                β”‚
β”‚       β–Ό                                                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                   β”‚
β”‚  β”‚ ADLS/    β”‚<────│ Use JWT  β”‚                                   β”‚
β”‚  β”‚ Synapse  β”‚     β”‚ as Bearerβ”‚                                   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ Token    β”‚                                   β”‚
β”‚                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Bicep Template for IAM Setup

// Managed Identity for ADF
resource managedIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: 'mi-datafactory-prod'
  location: location
  tags: {
    Environment: 'Production'
    Project: 'DataEngineering'
  }
}

// Role Assignment for ADF on ADLS
resource roleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(resourceGroup().id, 'Storage Blob Data Contributor', managedIdentity.id)
  scope: storageAccount
  properties: {
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe')
    principalId: managedIdentity.properties.principalId
    principalType: 'ServicePrincipal'
  }
}

// Role Assignment for Key Vault access
resource keyVaultRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(resourceGroup().id, 'Key Vault Secrets User', managedIdentity.id)
  scope: keyVault
  properties: {
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '4633458b-17de-408a-b875-068636670185')
    principalId: managedIdentity.properties.principalId
    principalType: 'ServicePrincipal'
  }
}

// Output the Managed Identity Client ID
output managedIdentityClientId string = managedIdentity.properties.clientId
output managedIdentityPrincipalId string = managedIdentity.properties.principalId

ℹ️

Best Practice: Use User-Assigned Managed Identities when multiple services (ADF, Databricks, Synapse) need to access the same resources. This simplifies RBAC management and avoids role assignment proliferation.

Conditional Access Policies for Data Engineering

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              CONDITIONAL ACCESS POLICY FLOW                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚  β”‚ User/SP  │────>β”‚ Sign-in      │────>β”‚ Policy       β”‚       β”‚
β”‚  β”‚ Request  β”‚     β”‚ Request      β”‚     β”‚ Evaluation   β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚                                                β”‚                β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€                β”‚
β”‚                    β”‚                           β”‚                β”‚
β”‚                    β–Ό                           β–Ό                β”‚
β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚            β”‚ Block:       β”‚           β”‚ Grant:       β”‚         β”‚
β”‚            β”‚ β€’ Outside IP β”‚           β”‚ β€’ MFA        β”‚         β”‚
β”‚            β”‚ β€’ No MFA     β”‚           β”‚ β€’ Compliant  β”‚         β”‚
β”‚            β”‚ β€’ Non-Comply β”‚           β”‚   Device     β”‚         β”‚
β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚ β€’ App Cond.  β”‚         β”‚
β”‚                                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚                                                                 β”‚
β”‚  POLICY EXAMPLES FOR DATA ENGINEERING:                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ 1. Block sign-ins from outside corporate network        β”‚   β”‚
β”‚  β”‚ 2. Require MFA for all admin operations                 β”‚   β”‚
β”‚  β”‚ 3. Require compliant device for Synapse access          β”‚   β”‚
β”‚  β”‚ 4. Block legacy authentication protocols                β”‚   β”‚
β”‚  β”‚ 5. Session timeout for production portal access         β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Python SDK for IAM Management

from azure.identity import DefaultAzureCredential
from azure.mgmt.authorization import AuthorizationManagementClient
from azure.mgmt.authorization.models import RoleAssignmentCreateParameters

credential = DefaultAzureCredential()
auth_client = AuthorizationManagementClient(credential, subscription_id)

# Assign 'Storage Blob Data Contributor' to ADF Managed Identity
role_assignment_params = RoleAssignmentCreateParameters(
    role_definition_id=f"/subscriptions/{subscription_id}/providers/Microsoft.Authorization/roleDefinitions/ba92f5b4-2d11-453d-a403-e96b0029c9fe",
    principal_id=adf_managed_identity_principal_id,
    principal_type="ServicePrincipal"
)

auth_client.role_assignments.create(
    scope=f"/subscriptions/{subscription_id}/resourceGroups/rg-datalake-prod/providers/Microsoft.Storage/storageAccounts/stdatalake001",
    role_assignment_name="adf-blob-contributor",
    parameters=role_assignment_params
)

# List all role assignments for a resource
assignments = auth_client.role_assignments.list_for_scope(
    scope=f"/subscriptions/{subscription_id}/resourceGroups/rg-datalake-prod"
)
for assignment in assignments:
    print(f"Role: {assignment.role_definition_id}")
    print(f"Principal: {assignment.principal_id}")
    print(f"Type: {assignment.principal_type}")

Interview Questions

Q1: Why should you never use Storage Account Keys for data engineering pipelines? A: Storage Account Keys provide full access to the storage account and are long-lived credentials that can be compromised. Managed Identities eliminate credential management, provide automatic rotation, and enable granular RBAC. If keys must be used, store them in Key Vault and rotate regularly.

Q2: Explain the difference between RBAC at the Storage Account level vs Container level. A: Storage Account-level RBAC applies to all containers and blobs. Container-level RBAC (using resource scope) provides more granular control. For example, grant a service principal access to only the "raw" container but not "curated."

Q3: How do you troubleshoot a 403 Forbidden error when ADF tries to access ADLS? A: Check: 1) Managed Identity is enabled on ADF, 2) Correct RBAC role is assigned at the right scope, 3) No Deny assignments override the role, 4) Private Endpoints/Firewall rules allow traffic, 5) Azure AD tenant matches between resources.

Advertisement