Always monitor your BigQuery costs using INFORMATION_SCHEMA. Set up budget alerts at 50%, 80%, and 100% thresholds.
Cost Optimization Framework
BigQuery Pricing Models
# BigQuery pricing comparison
pricing = {
"on_demand": {
"query_cost": "$5.00 per TB scanned",
"free_tier": "1 TB/month",
"best_for": "Ad-hoc queries, <100 queries/day",
"example": "100 queries Γ 10GB = $5.00/day = $150/month"
},
"flat_rate_100_slots": {
"monthly_cost": "$2,000/month (1yr CUD)",
"commitment": "100 slots guaranteed",
"best_for": "Predictable workloads, >100 queries/day",
"savings": "40-55% vs on-demand for heavy usage"
},
"autoscale": {
"minimum": "100 slots required",
"cost": "$0.04 per slot-hour",
"best_for": "Variable workloads, batch processing",
"example": "100 slots Γ 730 hours = $2,920/month"
}
}
Cost Monitoring
from google.cloud import billing_v1
client = billing_v1.CloudBillingClient()
# Get billing account
billing_account = client.get_billing_account(
name="billingAccounts/XXXXXX-XXXXXX-XXXXXX"
)
# List cost management exports
# Configure BigQuery export for cost analysis
cost_query = """
SELECT
service.description as service,
SUM(cost) as total_cost,
SUM(usage.amount) as usage_amount,
usage.unit as usage_unit
FROM `project.dataset.gcp_billing_export`
WHERE _PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY 1, 4
ORDER BY 2 DESC
"""
βΉοΈ
Cost Tip: Start with on-demand pricing for exploration, then switch to committed slots for predictable workloads. Use autoscale for variable loads. Always enable budget alerts and review costs monthly. Use preemptible VMs for batch processing to save up to 91%.
Common Interview Questions
Q1: When should you use committed vs. on-demand BigQuery slots?
Answer: Use committed slots (1yr/3yr CUD) for predictable, steady-state workloads with >100 queries/day. Use on-demand for ad-hoc, variable, or light workloads. The break-even point is typically around 100 queries/day at 10GB each.
Q2: How much can you save with preemptible VMs?
Answer: Preemptible VMs provide up to 91% savings compared to on-demand. They're ideal for fault-tolerant batch workloads. Master nodes should always be on-demand. Workers can be preemptible for Dataproc and Dataflow batch jobs.
Q3: What is FlexRS and when should you use it?
Answer: FlexRS (Flexible Resource Scheduling) provides up to 50% savings by using a mix of preemptible and on-demand VMs with longer execution times (up to 6 hours). Use it for non-urgent batch jobs like daily aggregations and backfills.
Q4: How do you monitor GCP costs effectively?
Answer: 1) Set up budget alerts at 50%, 75%, 90%, 100%, 2) Use BigQuery cost export for analysis, 3) Review costs monthly, 4) Tag resources for cost allocation, 5) Use Cloud Billing reports for trends, 6) Set up alerts for unusual spending.
Q5: What is the break-even point for BigQuery committed slots?
Answer: The break-even depends on query volume and size. Generally, if you're scanning more than 100 TB/month on-demand, committed slots become cost-effective. For 100 slots at $2,000/month, you need ~400 queries/day at 10GB each to break even.