System Design Problems
Design a Unique ID Generator
Generating unique identifiers across distributed systems is a fundamental challenge. IDs must be globally unique, ideally time-ordered, and generated without coordination between nodes. Twitter's Snowflake, UUIDs, and database sequences are common approaches.
- Globally Unique β No two nodes ever produce the same ID
- Time-Ordered β IDs sort chronologically for efficient range queries
- Scalable β Generate millions of IDs per second without bottlenecks
The tension is between uniqueness (no coordination), ordering (time-component), and compactness (short IDs for URLs).
Requirements
Functional Requirements
- Generate unique 64-bit integer IDs
- IDs must be globally unique across all nodes
- IDs should be roughly time-ordered (sortable by creation time)
- IDs should be compact (fit in 64 bits)
- Support 10K+ ID generation requests per second per node
Non-Functional Requirements
- Latency: Generate ID in < 1ms
- Availability: ID generation must never be a single point of failure
- Scalability: Support 1000+ nodes generating IDs concurrently
- Durability: Generated IDs must never be lost or reused
ID generation is a critical infrastructure component. If the ID generator goes down, the entire system stops. Design for high availability with no single point of failure.
Back-of-the-Envelope Estimation
ID Space Estimation
64-bit integer space: 2^64 β 18.4 Γ 10^18
At 10 million IDs/second:
- Time until exhaustion: 18.4 Γ 10^18 / (10^7 Γ 86400 Γ 365) β 58,000 years
A 64-bit ID is more than sufficient for any practical application.
ID Generation Strategies
Snowflake ID Structure
DfSnowflake ID
A Snowflake ID is a 64-bit integer composed of: 1 sign bit + 41 timestamp bits + 10 machine ID bits + 12 sequence bits. It produces time-ordered, globally unique IDs without coordination.
Snowflake ID Generation
Here,
- =Milliseconds since custom epoch
- =Unique 10-bit machine identifier
- =12-bit counter (0-4095) per millisecond
Snowflake ID Decoding
ID: 1234567890123456789
Binary breakdown:
- Sign: 0 (positive)
- Timestamp: 01000110... (1467268800000 ms β 2016-06-30 12:00:00 UTC)
- Machine ID: 0000000101 (5)
- Sequence: 000000110010 (50)
The ID encodes creation time, machine, and sequence in 64 bits.
UUID Structure
DfUUID (Universally Unique Identifier)
A UUID is a 128-bit identifier that is globally unique without coordination. UUID v4 uses random numbers with a 122-bit random space, providing a collision probability of approximately 2^-61 for 1 billion IDs.
UUID Collision Probability
Here,
- =Number of UUIDs generated
UUID Collision Probability
For 1 billion UUIDs: P(collision) β (10^9)^2 / (2 Γ 2^122) β 10^18 / (2 Γ 5.2 Γ 10^36) β 10^-19
The probability of collision is negligible for any practical application.
Database Auto-Increment with Range Allocation
DfRange Allocation
A range allocation ID generator assigns each node a range of IDs from a central authority. The node generates IDs within its range without further coordination. When the range is exhausted, the node requests a new range.
Range Allocation
Here,
- =Start of allocated range
- =Local counter within range
- =Number of IDs in the range
Allocate ranges of 10,000+ IDs per node to minimize coordination frequency. Use a separate "range allocator" service with database backing for persistence.
Twitter Snowflake Implementation
class SnowflakeGenerator:
EPOCH = 1288834974657 # Custom epoch (Nov 4, 2010)
TIMESTAMP_BITS = 41
MACHINE_BITS = 10
SEQUENCE_BITS = 12
MAX_SEQUENCE = (1 << SEQUENCE_BITS) - 1 # 4095
MAX_MACHINE = (1 << MACHINE_BITS) - 1 # 1023
def __init__(self, machine_id):
self.machine_id = machine_id & self.MAX_MACHINE
self.sequence = 0
self.last_timestamp = -1
def next_id(self):
timestamp = self._current_millis()
if timestamp == self.last_timestamp:
self.sequence = (self.sequence + 1) & self.MAX_SEQUENCE
if self.sequence == 0:
timestamp = self._wait_next_millis()
else:
self.sequence = 0
self.last_timestamp = timestamp
return ((timestamp - self.EPOCH) << (self.MACHINE_BITS + self.SEQUENCE_BITS)) | \
(self.machine_id << self.SEQUENCE_BITS) | \
self.sequence
Practice Exercises
-
Design: How would you handle clock synchronization issues in a Snowflake ID generator? What happens if a machine's clock goes backward?
-
Scale: If you need to generate 10 million IDs per second, how many machines are needed with Snowflake (12-bit sequence)? What if you use 8-bit sequence instead?
-
Trade-offs: Compare Snowflake IDs and UUID v4 for a URL shortener. What are the trade-offs in terms of ID length, ordering, and uniqueness guarantees?
-
Edge Case: How would you ensure uniqueness if a Snowflake node crashes and restarts with the same machine ID? Design a recovery mechanism.
Key Takeaways:
- Snowflake IDs provide time-ordered, globally unique 64-bit IDs without coordination
- UUID v4 is simple and coordination-free but produces 128-bit non-sortable IDs
- Database auto-increment is simple but creates a single point of failure
- Range allocation avoids runtime coordination while maintaining uniqueness
- The 12-bit sequence counter allows 4096 IDs per millisecond per machine
What to Learn Next
-> Design URL Shortener Base62 encoding of Snowflake IDs for compact URLs.
-> Consistent Hashing Distributing work across nodes using hash rings.
-> Data Replication Replicating generated IDs across multiple nodes.
-> Databases Database sequences, auto-increment, and distributed ID schemes.
-> Distributed Consensus Raft, Paxos, and coordination in distributed systems.
-> Design Key-Value Store Distributed storage for ID allocation and deduplication.