πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Design a Unique ID Generator

System Design ProblemsDistributed ID Generation🟒 Free Lesson

Advertisement

System Design Problems

Design a Unique ID Generator

Generating unique identifiers across distributed systems is a fundamental challenge. IDs must be globally unique, ideally time-ordered, and generated without coordination between nodes. Twitter's Snowflake, UUIDs, and database sequences are common approaches.

  • Globally Unique β€” No two nodes ever produce the same ID
  • Time-Ordered β€” IDs sort chronologically for efficient range queries
  • Scalable β€” Generate millions of IDs per second without bottlenecks

The tension is between uniqueness (no coordination), ordering (time-component), and compactness (short IDs for URLs).

Requirements

Functional Requirements

  • Generate unique 64-bit integer IDs
  • IDs must be globally unique across all nodes
  • IDs should be roughly time-ordered (sortable by creation time)
  • IDs should be compact (fit in 64 bits)
  • Support 10K+ ID generation requests per second per node

Non-Functional Requirements

  • Latency: Generate ID in < 1ms
  • Availability: ID generation must never be a single point of failure
  • Scalability: Support 1000+ nodes generating IDs concurrently
  • Durability: Generated IDs must never be lost or reused

ID generation is a critical infrastructure component. If the ID generator goes down, the entire system stops. Design for high availability with no single point of failure.

Back-of-the-Envelope Estimation

ID Space Estimation

64-bit integer space: 2^64 β‰ˆ 18.4 Γ— 10^18

At 10 million IDs/second:

  • Time until exhaustion: 18.4 Γ— 10^18 / (10^7 Γ— 86400 Γ— 365) β‰ˆ 58,000 years

A 64-bit ID is more than sufficient for any practical application.

ID Generation Strategies

UUID128-bit randomNo coordinationNot sortable36-char stringDB Auto-IncrementSequential IDsSimple to implementSingle point of failureDB bottleneckSnowflake64-bit structuredTime + machine + seqTime-orderedBest for most cases βœ“Pre-generatedID ranges per nodeNo runtime coordMay waste IDsFast lookup

Snowflake ID Structure

DfSnowflake ID

A Snowflake ID is a 64-bit integer composed of: 1 sign bit + 41 timestamp bits + 10 machine ID bits + 12 sequence bits. It produces time-ordered, globally unique IDs without coordination.

Sign1 bitTimestamp (ms since epoch)41 bits β†’ 69 yearsMachine ID10 bits β†’ 1024 nodesSequence12 bits β†’ 4096/msSnowflake 64-bit ID Layout

Snowflake ID Generation

id=(timestamp<<22)∣(machine_id<<12)∣sequenceid = (timestamp << 22) | (machine\_id << 12) | sequence

Here,

  • timestamptimestamp=Milliseconds since custom epoch
  • machineidmachine_id=Unique 10-bit machine identifier
  • sequencesequence=12-bit counter (0-4095) per millisecond

Snowflake ID Decoding

ID: 1234567890123456789

Binary breakdown:

  • Sign: 0 (positive)
  • Timestamp: 01000110... (1467268800000 ms β†’ 2016-06-30 12:00:00 UTC)
  • Machine ID: 0000000101 (5)
  • Sequence: 000000110010 (50)

The ID encodes creation time, machine, and sequence in 64 bits.

UUID Structure

DfUUID (Universally Unique Identifier)

A UUID is a 128-bit identifier that is globally unique without coordination. UUID v4 uses random numbers with a 122-bit random space, providing a collision probability of approximately 2^-61 for 1 billion IDs.

UUID Collision Probability

P(collision)β‰ˆn22Γ—2122P(collision) \approx \frac{n^2}{2 \times 2^{122}}

Here,

  • nn=Number of UUIDs generated

UUID Collision Probability

For 1 billion UUIDs: P(collision) β‰ˆ (10^9)^2 / (2 Γ— 2^122) β‰ˆ 10^18 / (2 Γ— 5.2 Γ— 10^36) β‰ˆ 10^-19

The probability of collision is negligible for any practical application.

Database Auto-Increment with Range Allocation

DfRange Allocation

A range allocation ID generator assigns each node a range of IDs from a central authority. The node generates IDs within its range without further coordination. When the range is exhausted, the node requests a new range.

Range Allocation

id=base+offsetmod  range_sizeid = base + offset \mod range\_size

Here,

  • basebase=Start of allocated range
  • offsetoffset=Local counter within range
  • rangesizerange_size=Number of IDs in the range

Allocate ranges of 10,000+ IDs per node to minimize coordination frequency. Use a separate "range allocator" service with database backing for persistence.

Twitter Snowflake Implementation

class SnowflakeGenerator:
    EPOCH = 1288834974657  # Custom epoch (Nov 4, 2010)
    
    TIMESTAMP_BITS = 41
    MACHINE_BITS = 10
    SEQUENCE_BITS = 12
    
    MAX_SEQUENCE = (1 << SEQUENCE_BITS) - 1  # 4095
    MAX_MACHINE = (1 << MACHINE_BITS) - 1    # 1023
    
    def __init__(self, machine_id):
        self.machine_id = machine_id & self.MAX_MACHINE
        self.sequence = 0
        self.last_timestamp = -1
    
    def next_id(self):
        timestamp = self._current_millis()
        
        if timestamp == self.last_timestamp:
            self.sequence = (self.sequence + 1) & self.MAX_SEQUENCE
            if self.sequence == 0:
                timestamp = self._wait_next_millis()
        else:
            self.sequence = 0
        
        self.last_timestamp = timestamp
        
        return ((timestamp - self.EPOCH) << (self.MACHINE_BITS + self.SEQUENCE_BITS)) | \
               (self.machine_id << self.SEQUENCE_BITS) | \
               self.sequence

Practice Exercises

  1. Design: How would you handle clock synchronization issues in a Snowflake ID generator? What happens if a machine's clock goes backward?

  2. Scale: If you need to generate 10 million IDs per second, how many machines are needed with Snowflake (12-bit sequence)? What if you use 8-bit sequence instead?

  3. Trade-offs: Compare Snowflake IDs and UUID v4 for a URL shortener. What are the trade-offs in terms of ID length, ordering, and uniqueness guarantees?

  4. Edge Case: How would you ensure uniqueness if a Snowflake node crashes and restarts with the same machine ID? Design a recovery mechanism.

Key Takeaways:

  • Snowflake IDs provide time-ordered, globally unique 64-bit IDs without coordination
  • UUID v4 is simple and coordination-free but produces 128-bit non-sortable IDs
  • Database auto-increment is simple but creates a single point of failure
  • Range allocation avoids runtime coordination while maintaining uniqueness
  • The 12-bit sequence counter allows 4096 IDs per millisecond per machine

What to Learn Next

-> Design URL Shortener Base62 encoding of Snowflake IDs for compact URLs.

-> Consistent Hashing Distributing work across nodes using hash rings.

-> Data Replication Replicating generated IDs across multiple nodes.

-> Databases Database sequences, auto-increment, and distributed ID schemes.

-> Distributed Consensus Raft, Paxos, and coordination in distributed systems.

-> Design Key-Value Store Distributed storage for ID allocation and deduplication.

⭐

Premium Content

Design a Unique ID Generator

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement