Schema Design Patterns
Implementation
from google.cloud import bigtable
from google.cloud.bigtable import column_family
import datetime
client = bigtable.Client(project="my-project", admin=True)
instance = client.instance("my-instance")
# Create table with multiple column families
table = instance.table("iot_data")
table.create(column_families={
"readings": column_family.MaxVersionsGCPolicy(1),
"metadata": column_family.MaxAgeGCPolicy(datetime.timedelta(days=365)),
"alerts": column_family.MaxVersionsGCPolicy(5)
})
# Write time-series data
timestamp = datetime.datetime.now()
max_ts = 9999999999999
reversed_ts = max_ts - int(timestamp.timestamp() * 1000)
row_key = f"sensor_123#{reversed_ts:013d}"
row = table.direct_row(row_key)
row.set_cell("readings", "temperature", "23.5")
row.set_cell("readings", "humidity", "65.2")
row.set_cell("metadata", "location", "warehouse_a")
row.commit()
# Read with time range
start_time = datetime.datetime.now() - datetime.timedelta(hours=1)
end_time = datetime.datetime.now()
start_reversed = max_ts - int(end_time.timestamp() * 1000)
end_reversed = max_ts - int(start_time.timestamp() * 1000)
row_range = bigtable.row_range.RowRange(
start_key=f"sensor_123#{start_reversed:013d}".encode(),
end_key=f"sensor_123#{end_reversed:013d}".encode()
)
rows = table.read_rows(row_range=row_range)
for row in rows:
print(f"Row: {row.row_key.decode()}")
β¨
Best Practice: Design row keys for even distribution across nodes. Use reversed timestamps for time-series queries. Keep column families minimal (<100). Configure GC policies based on data retention requirements. Use replication for disaster recovery.
Common Interview Questions
Q1: How do you prevent hotspots in Bigtable?
Answer: 1) Use hash prefixes on row keys, 2) Avoid monotonically increasing keys, 3) Distribute writes across multiple ordering keys, 4) Use salted row keys for high-throughput entities.
Q2: What is the difference between SSD and HDD in Bigtable?
Answer: SSD provides low-latency access (<10ms) for hot data. HDD provides lower cost for cold data with higher latency (>10ms). Use SSD for frequently accessed data, HDD for archival.
Q3: How does Bigtable replication work?
Answer: Bigtable automatically replicates data across clusters for high availability. Replication is eventually consistent (typically <10 seconds). Use multi-cluster routing for automatic failover.
Q4: When should you use Bigtable vs. Firestore?
Answer: Bigtable for high-throughput time-series/IoT data (millions of ops/sec). Firestore for document-based data with complex queries and real-time updates. Bigtable for analytics; Firestore for applications.
Q5: How do you optimize Bigtable costs?
Answer: 1) Use SSD for hot data, HDD for cold, 2) Configure aggressive GC policies, 3) Right-size nodes based on throughput, 4) Use replication only for availability, 5) Monitor and optimize row key design.