Databricks SQL: Serverless, Alerts & SQL Warehouses
SQL analytics with Databricks SQL warehouses, serverless compute, and query optimization
Databricks SQL Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATABRICKS SQL ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β SQL WAREHOUSES β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β SERVERLESS SQL WAREHOUSE β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β’ Instant startup (seconds) β β β
β β β β’ Auto-scale 1-16 slots β β β
β β β β’ No infrastructure management β β β
β β β β’ Pay per query execution β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β CLASSIC SQL WAREHOUSE β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β’ Shared/Prophet/Enterprise β β β
β β β β’ Size: 2X-Small to 2X-Large β β β
β β β β’ Auto-stop: Configurable β β β
β β β β’ Persistent connection support β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β FEATURES β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ SQL Queries (T-SQL compatible) β β
β β β’ Dashboards (visualizations) β β
β β β’ Alerts (threshold-based notifications) β β
β β β’ Query History β β
β β β’ Result Caching β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SQL Warehouse Operations
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import sql
w = WorkspaceClient()
# Create SQL Warehouse
warehouse = w.sql_warehouses.create(
name="Analytics Warehouse",
cluster_size="2X-Small",
max_num_clusters=4,
min_num_clusters=1,
auto_stop_mins=10,
enable_serverless_compute=True,
channel="CHANNEL_10_4"
)
# Execute query
result = w.statement_execution.execute_statement(
warehouse_id=warehouse.id,
statement="""
SELECT
sale_date,
SUM(total_amount) AS revenue,
COUNT(*) AS transactions
FROM sales.fact_sales
WHERE sale_date >= '2024-01-01'
GROUP BY sale_date
ORDER BY sale_date
"""
)
for row in result.result.data_array:
print(row)
Alert Configuration
# Create alert
alert = w.alerts.create(
name="Revenue Drop Alert",
query="""
SELECT
SUM(total_amount) as daily_revenue
FROM sales.fact_sales
WHERE sale_date = current_date()
""",
condition="daily_revenue < 100000",
state="ACTIVE",
subscribers=[
{"user_id": "user@company.com", "alert_type": "EMAIL"}
]
)
Parameterized Queries
-- Dashboard query with parameters
SELECT
sale_date,
product_category,
SUM(total_amount) AS revenue,
COUNT(*) AS transactions
FROM sales.fact_sales
WHERE sale_date BETWEEN :start_date AND :end_date
AND product_category = :category
GROUP BY sale_date, product_category
ORDER BY sale_date;
βΉοΈ
Pro Tip: Use Serverless SQL Warehouses for dashboards (instant startup) and Classic Warehouses for ETL workloads (persistent connections). Configure auto-stop to minimize costs.
Interview Questions
Q1: When would you use Serverless vs Classic SQL Warehouses? A: Serverless for dashboards and ad-hoc queries (instant startup). Classic for ETL workloads requiring persistent connections. Serverless for pay-per-use; Classic for predictable costs.
Q2: How do you handle query performance issues in Databricks SQL? A: 1) Check query plan (EXPLAIN), 2) Update table statistics, 3) Use Z-ORDER on filtered columns, 4) Increase warehouse size, 5) Use result caching, 6) Optimize SQL queries.
Q3: What are the cost components of Databricks SQL? A: 1) SQL Warehouse compute (DBU/hour), 2) Storage (ADLS), 3) Serverless overhead. Use auto-stop, right-sizing, and reserved capacity to optimize costs.