πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Design Google Docs

System Design ProblemsCollaborative Editing🟒 Free Lesson

Advertisement

System Design Problems

Design Google Docs

Google Docs enables multiple users to edit the same document simultaneously with changes visible in real-time. The system must resolve concurrent edits without conflicts, maintain document consistency, and handle hundreds of simultaneous collaborators.

  • Real-time Collaboration β€” Changes visible to all users within 100ms
  • Conflict Resolution β€” Operational Transformation ensures consistency
  • Offline Support β€” Continue editing offline; sync when reconnected

The core challenge is Operational Transformation (OT): transforming concurrent operations so they produce the same result regardless of execution order.

Requirements

Functional Requirements

  • Multiple users can edit the same document simultaneously
  • Changes appear in real-time (within 100ms)
  • Cursor position visible for all collaborators
  • Edit history with ability to undo/redo
  • Comments and suggestions
  • Offline editing with sync on reconnect
  • Document sharing with permissions

Non-Functional Requirements

  • Latency: Changes visible in < 100ms
  • Consistency: All users see the same document state
  • Scale: 100 concurrent editors per document
  • Durability: Never lose edits
  • Availability: 99.99%

Google Docs was originally built on Operational Transformation (OT). Modern systems like Figma use CRDTs (Conflict-free Replicated Data Types) which don't require a central server for conflict resolution.

Back-of-the-Envelope Estimation

Google Docs Capacity

  • 2 billion documents, 1 billion active users
  • Average document: 20 KB text
  • Total storage: 2B Γ— 20 KB = 40 TB
  • Concurrent editing sessions: 10M (1% of users)
  • Operations per second: 10M Γ— 10 ops/sec = 100M ops/sec
  • WebSocket connections: 10M concurrent

Operational Transformation

DfOperational Transformation (OT)

OT is a technique for consistency in collaborative editing. Each user's edit is represented as an operation. When concurrent operations arrive, they are transformed against each other so they produce the same document state regardless of execution order.

OT Transformation

transform(op1,op2)=(op1β€²,op2β€²)transform(op1, op2) = (op1', op2')

Here,

  • op1op1=First operation (e.g., insert 'a' at position 5)
  • op2op2=Second operation (e.g., insert 'b' at position 3)
  • op1β€²op1'=Transformed op1 (position adjusted for op2)
  • op2β€²op2'=Transformed op2 (position adjusted for op1)

OT Transformation Example

User A: Insert 'a' at position 5 User B: Insert 'b' at position 3

Without transformation:

  • A executes first: ...b...a... (A at 5, B at 3)
  • B executes first: ...b...a... (B at 3, A at 5) Both produce the same result βœ“

With transformation:

  • A receives B's op: Insert 'a' at position 6 (shifted right by 1)
  • B receives A's op: Insert 'b' at position 3 (no shift) Both produce: ...b...a... βœ“

OT Properties

DfTP1 (Transformation Property 1)

For any two concurrent operations o1 and o2: transform(o1, o2) applied to state S produces the same result as transform(o2, o1) applied to state S.

DfTP2 (Transformation Property 2)

For any operation o and transformed operations o1', o2': transform(compose(o1', o2'), o) = compose(transform(o1', o), transform(o2', o'))

TP1 ensures convergence (all users reach the same state). TP2 ensures that composing transformed operations is equivalent to transforming composed operations.

Insert Operation Transformation

Insert-Insert Transformation

transform(insert(p1,c1),insert(p2,c2))={insert(p1+1,c1)ifΒ p1β‰₯p2insert(p1,c1)ifΒ p1<p2transform(insert(p1, c1), insert(p2, c2)) = \begin{cases} insert(p1 + 1, c1) & \text{if } p1 \geq p2 \\ insert(p1, c1) & \text{if } p1 < p2 \end{cases}

Here,

  • p1p1=Position of first insert
  • p2p2=Position of second insert
  • c1,c2c1, c2=Characters being inserted

Insert-Insert Transformation

User A: Insert 'X' at position 5 User B: Insert 'Y' at position 3

Transform(A, B):

  • A's position 5 >= B's position 3, so A shifts right: Insert 'X' at position 6

Transform(B, A):

  • B's position 3 < A's position 5, so B stays: Insert 'Y' at position 3

Both produce: ...Y...X...

Delete Operation Transformation

Delete-Insert Transformation

transform(delete(p1),insert(p2,c))={delete(p1+1)ifΒ p1β‰₯p2delete(p1)ifΒ p1<p2transform(delete(p1), insert(p2, c)) = \begin{cases} delete(p1 + 1) & \text{if } p1 \geq p2 \\ delete(p1) & \text{if } p1 < p2 \end{cases}

Here,

  • p1p1=Position of delete
  • p2p2=Position of insert

High-Level Architecture

User AUser BUser CWebSocketGatewayOT ServerOperation BufferTransformerVersion TrackerBroadcast ManagerCursor TrackerPersistenceSnapshot ManagerOperation Log(Append-only)Document Store(Snapshots)Google Docs OT Architecture

Detailed Design

Operation Model

DfOperation

An operation represents a single atomic edit. Operations are: insert(character, position), delete(position), or retain(count). Each operation has a version number indicating the base document version it was created from.

Architecture Diagram
Operation = {
  type: "insert" | "delete" | "retain",
  position: int,
  character: char (for insert),
  version: int,       // Base version this op was created from
  client_id: string,  // Who created this op
  timestamp: long
}

Server-side OT Flow

  1. Client sends operation with base version V
  2. Server receives operation
  3. Server transforms operation against all operations after version V
  4. Server applies transformed operation to document
  5. Server increments version to V+1
  6. Server broadcasts transformed operation to all other clients
  7. Server persists operation to operation log

The server is the single source of truth. Clients send operations; the server transforms and applies them. This central authority prevents divergence.

Client-side OT Flow

  1. User makes local edit β†’ create operation with current local version
  2. Apply operation locally immediately (optimistic update)
  3. Send operation to server
  4. Receive transformed operation from server
  5. Adjust local state based on server's transformed operation
Client AServerClient Bop_A (v5)op_A' (transformed, v6)op_B (v5)op_B' (transformed, v6)OT ensures both clients reach the same state

Version Vector

DfVersion Vector

A version vector tracks the operation history per client. It enables the server to determine which operations an client has seen and which transformations are needed.

Architecture Diagram
Version Vector: {
  client_a: 15,  // Client A has applied 15 ops
  client_b: 12,  // Client B has applied 12 ops
  client_c: 18   // Client C has applied 18 ops
}

When Client A (version 15) sends an operation, the server checks which operations after version 15 need to be transformed against. If other clients have applied operations 16-18, A's operation is transformed against those.

Persistence Strategy

ComponentStorageReason
Operation LogAppend-only logDurable, enables replay
Document SnapshotsPeriodic snapshotsFast document loading
Operation BufferIn-memory (Redis)Recent ops for transformation

Take snapshots every 100 operations. To load a document, load the latest snapshot and replay subsequent operations. This balances load time with storage cost.

Scaling OT Servers

DfDocument Affinity

Document affinity routes all operations for a document to the same OT server. This ensures all operations for a document are processed in order without distributed coordination.

Server Assignment

server=hash(document_id)mod  Nserversserver = hash(document\_id) \mod N_{servers}

Here,

  • documentiddocument_id=Unique document identifier
  • NserversN_{servers}=Number of OT servers

Document affinity means one document's operations are processed by one server. If a document has > 100 concurrent editors, the single server may become a bottleneck. Consider splitting very active documents across multiple servers.

Practice Exercises

  1. Algorithm: Implement the OT transform function for insert-insert and delete-insert cases. Prove that TP1 and TP2 hold for your implementation.

  2. Scale: If a document has 100 concurrent editors, each making 10 operations/second, the server processes 1000 ops/sec for one document. Estimate the server resources needed.

  3. Offline: Design a system for offline editing that syncs changes when the user reconnects. How do you handle conflicts with the online version?

  4. Comparison: Compare OT and CRDTs for collaborative editing. What are the trade-offs in terms of complexity, server requirements, and undo/redo support?

Key Takeaways:

  • OT ensures convergence by transforming concurrent operations against each other
  • The server is the single source of truth; clients send operations, server transforms and broadcasts
  • Document affinity routes all operations for a document to the same server
  • Snapshots every 100 operations balance load time with storage cost
  • Version vectors track per-client operation history for transformation

What to Learn Next

-> Design Google Drive File storage, sync, and conflict resolution.

-> Design Chat System Real-time messaging with WebSocket and presence.

-> Distributed Consensus Raft, Paxos, and consistency in distributed systems.

-> Data Replication Replicating document state across servers.

-> Message Queues Broadcasting operations to collaborators.

-> Design News Feed Real-time updates with WebSocket.

⭐

Premium Content

Design Google Docs

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement