Data Systems
NewSQL and Distributed SQL
NewSQL databases combine the best of SQL and NoSQL: strong consistency, ACID transactions, and horizontal scalability. Master the architecture of Spanner, CockroachDB, and YugabyteDB.
- Consistency β Serializable transactions across distributed nodes
- Scalability β Horizontal scaling with automatic sharding
- Global β Multi-region deployment with low-latency reads
NewSQL proves you don't have to sacrifice consistency for scalability.
The NewSQL Promise
NewSQL databases aim to provide the scalability of NoSQL with the ACID guarantees and SQL interface of traditional relational databases.
DfNewSQL
NewSQL is a class of relational database management systems that combine the SQL interface and ACID transaction guarantees of traditional databases with the horizontal scalability and high availability of NoSQL systems. They achieve this through distributed architectures, consensus protocols, and optimized query execution across multiple nodes.
The CAP Theorem in Practice
DfConsensus Protocol
A consensus protocol ensures that distributed nodes agree on a single value or order of operations, even in the presence of failures. Protocols like Paxos and Raft provide strong consistency guarantees by requiring a majority of nodes to agree before committing changes.
Google Spanner
DfGoogle Spanner
Google Spanner is a globally distributed, strongly consistent database that combines SQL with horizontal scalability. It uses TrueTime (atomic clocks + GPS) for external consistency across data centers, enabling serializable transactions across the globe with high availability.
TrueTime and External Consistency
TrueTime Uncertainty
Here,
- =TrueTime interval after an event
- =TrueTime interval before an event
- =The later event
- =The earlier event
Spanner's TrueTime provides a time API that returns an uncertainty interval [earliest, latest]. When committing a transaction, Spanner waits out the uncertainty interval to guarantee that commits are ordered correctly across the globe. This is the key innovation that enables global strong consistency.
Spanner Architecture
| Component | Purpose |
|---|---|
| Spanserver | Stores data, serves queries |
| Zone Master | Assigns data to spanservers |
| Directory Manager | Moves data between zones |
| Placement Driver | Manages cross-zone replication |
CockroachDB
DfCockroachDB
CockroachDB is a distributed SQL database inspired by Spanner. It provides serializable transactions, automatic sharding, and multi-region deployment using the Raft consensus protocol. Unlike Spanner, it runs on commodity hardware without requiring atomic clocks.
Raft Consensus
CockroachDB vs Spanner
| Feature | CockroachDB | Spanner |
|---|---|---|
| Consensus | Raft | Paxos |
| Time Source | Hybrid logical clocks | TrueTime (atomic clocks) |
| Deployment | Self-hosted, cloud | GCP only |
| Consistency | Serializable | External consistency |
| Latency | Higher (no atomic clocks) | Lower (TrueTime) |
| Cost | Lower | Higher (GCP pricing) |
CockroachDB is the best choice when you need Spanner-like guarantees without being locked into GCP. It runs on any cloud or on-premises and provides serializable transactions across distributed nodes.
Distributed SQL Internals
Automatic Sharding
DfAutomatic Sharding
Automatic sharding is the process of distributing data across multiple nodes without manual intervention. The database automatically splits data into ranges, assigns ranges to nodes, and rebalances when nodes are added or removed. This enables transparent horizontal scaling.
Transaction Coordination
Distributed Transaction Latency
Here,
- =Total transaction latency
- =Coordinator processing time
- =Replica processing time
- =Commit synchronization time
Distributed transactions have higher latency than single-node transactions because they require coordination across nodes. For latency-critical workloads, consider whether distributed transactions are necessary or if eventual consistency would suffice.
When to Use NewSQL
| Use Case | Justification |
|---|---|
| Global financial systems | Strong consistency across regions |
| Multi-region applications | Low-latency reads worldwide |
| ACID at scale | Transactions across sharded data |
| Regulatory compliance | Data residency with consistency |
| Migration from monolith | Keep SQL while gaining scalability |
Practice Exercises
-
Architecture Comparison: Compare Spanner and CockroachDB for a global e-commerce platform. What are the trade-offs in terms of consistency, latency, cost, and operational complexity?
-
Transaction Design: Design a distributed transaction for a banking system that transfers money between accounts in different regions. How do you handle the coordination?
-
Sharding Strategy: If you were designing CockroachDB's automatic sharding, what algorithm would you use to distribute data? How do you handle hot spots?
-
Migration Planning: Your team is migrating from PostgreSQL to CockroachDB. What are the main challenges, and how would you approach the migration?
Key Takeaways:
- NewSQL combines SQL semantics with horizontal scalability and strong consistency
- Spanner uses TrueTime (atomic clocks) for global external consistency
- CockroachDB uses Raft consensus without requiring specialized hardware
- Distributed transactions have higher latency than single-node transactions
- NewSQL is ideal for global applications requiring ACID at scale
What to Learn Next
-> Spanner and CockroachDB Deep dive into specific NewSQL implementations.
-> SQL Deep Dive PostgreSQL, MySQL, indexing strategies, and query optimization.
-> Distributed Consensus Paxos, Raft, and consensus protocols.
-> CAP Theorem Consistency models, availability, and partition tolerance.
-> Data Partitioning Sharding strategies, consistent hashing, and partition keys.
-> Data Replication Sync vs async replication, leader election, and consistency.