System Design Problems
Design Google Drive
Google Drive enables users to store files in the cloud, sync across devices, and share with collaborators. The system must handle file uploads (up to 5 TB), real-time sync, conflict resolution when multiple users edit simultaneously, and efficient delta sync for large files.
- File Sync β Changes propagate across devices within seconds
- Conflict Resolution β Handle concurrent edits gracefully
- Sharing β Granular permissions (view, comment, edit) per file/folder
The fundamental challenge is synchronization: when two users edit the same file simultaneously, the system must resolve conflicts without losing data.
Requirements
Functional Requirements
- Upload and download files (up to 5 TB)
- Real-time sync across multiple devices
- File versioning and history
- Conflict resolution for concurrent edits
- Sharing with permissions (view, comment, edit)
- Folder hierarchy and organization
- Offline editing with sync on reconnect
Non-Functional Requirements
- Latency: Sync changes within 5 seconds
- Durability: 99.999999999% (11 nines)
- Availability: 99.99%
- Scale: 1 billion files, 500M active users
- Bandwidth: Efficient delta sync for large files
Google Drive uses delta sync (rsync-like): only transfer the changed parts of a file, not the entire file. This reduces bandwidth for large files (e.g., video editing projects).
Back-of-the-Envelope Estimation
Google Drive Capacity
- 1 billion files Γ 10 MB average = 10 PB
- 500M users Γ 10 GB free = 5 EB (with deduplication: ~500 PB)
- 100M sync operations/day = 1,160 QPS
- 10M file uploads/day = 116 QPS
- Storage growth: 50 PB/day
High-Level Architecture
Detailed Design
File Chunking and Block Storage
DfBlock Storage
Large files are split into fixed-size blocks (e.g., 256 KB). Each block is stored independently and identified by a hash. Deduplication occurs at the block level: identical blocks across files are stored once.
Block Deduplication
Here,
- =Total number of unique blocks
- =Size per block (typically 256 KB)
- =Deduplication ratio (typically 2-5x)
Block Deduplication Savings
User A uploads "report.pdf" (100 pages = 50 MB) User B uploads "report.pdf" (same file) β 0 additional storage (dedup) User C uploads "report_v2.pdf" (changed 2 pages) β only 2 MB new blocks
With 256 KB blocks: 50 MB / 256 KB = 200 blocks Changed 2 pages: ~8 blocks new = 2 MB additional storage
Delta Sync
DfDelta Sync
Delta sync transfers only the changed blocks of a file, not the entire file. The client computes a block-level diff and uploads only new/modified blocks. This reduces bandwidth for large files.
Use rolling hash (e.g., Rabin fingerprint) to detect block boundaries in a stream. This allows efficient diff computation without knowing block boundaries in advance.
Conflict Resolution
DfOperational Transformation
Operational Transformation (OT) transforms concurrent operations to maintain consistency. When two users edit the same document, operations are transformed so they produce the same result regardless of execution order.
Google Drive uses a combination of strategies:
- Files: Create conflict copies (both versions preserved)
- Google Docs: Operational Transformation (real-time collaboration)
- Binary files: Last-write-wins with conflict copies
For binary files (PDF, images):
- User A edits file offline
- User B edits file online
- User A reconnects and syncs
- System detects conflict (version mismatch)
- Creates "conflict copy" for User A
- User B's version becomes the current version
- User manually merges if needed
File Versioning
DfFile Versioning
Every file change creates a new version. Previous versions are retained and accessible. Version history enables rollback and audit trails.
File versions:
v1: block_1, block_2, block_3 (created 2026-06-18)
v2: block_1, block_4, block_3 (created 2026-06-19, block_2 changed to block_4)
v3: block_5, block_4, block_3 (created 2026-06-20, block_1 changed to block_5)
Store only unique blocks across versions. Version metadata records which blocks compose each version. Rollback reconstructs the file from historical block references.
Sharing and Permissions
DfACL (Access Control List)
Each file/folder has an ACL listing users and their permissions. Permissions are hierarchical: folder permissions inherit to children unless explicitly overridden.
File ACL:
{
"file_id": "file_123",
"owner": "user_a@example.com",
"permissions": [
{ "user": "user_b@example.com", "role": "editor" },
{ "user": "user_c@example.com", "role": "viewer" },
{ "group": "team@example.com", "role": "commenter" }
]
}
Practice Exercises
-
Design: How would you implement real-time collaborative editing for Google Docs (not just binary files)? Design the Operational Transformation system.
-
Scale: If 500M users each have 10 GB of files with a deduplication ratio of 3x, estimate the total storage and the block storage architecture.
-
Sync: Design a delta sync algorithm that efficiently computes the diff between two versions of a 10 GB video file.
-
Offline: How would you handle offline editing where the user edits a file on two devices without internet? Design the conflict detection and resolution strategy.
Key Takeaways:
- Block-level storage (256 KB blocks) enables deduplication and delta sync
- Delta sync transfers only changed blocks, reducing bandwidth by 90%+ for large files
- Conflict resolution: binary files β conflict copies; Docs β Operational Transformation
- File versioning stores only unique blocks across versions for space efficiency
- ACL-based sharing with hierarchical permissions (folder β file inheritance)
What to Learn Next
-> Design Google Docs Real-time collaborative editing with OT/CRDT.
-> Design Object Storage Block storage and deduplication at scale.
-> Data Replication Replicating file data across data centers.
-> Design Chat System Real-time sync with WebSocket and conflict resolution.
-> Caching Strategies Caching frequently accessed files at the edge.
-> Security Patterns File encryption and access control.