System Design Problems
Design Dropbox
Dropbox serves 500M+ users with 2B+ files synced daily. This design covers file chunking, real-time synchronization, conflict resolution, and collaborative editing.
- Scale — 500M+ users, 2B+ files, 10M+ concurrent
- Sync — Real-time with < 1 second propagation
- Storage — Exabytes of user files with deduplication
Dropbox's core challenge is maintaining file consistency across millions of devices while handling conflicts gracefully.
Requirements Clarification
Functional Requirements
- Upload and download files
- Automatic file synchronization across devices
- File versioning and history
- Share files and folders with others
- Offline access with sync on reconnect
- Conflict detection and resolution
Non-Functional Requirements
- Availability: 99.99% uptime
- Durability: 99.999999999% (11 nines)
- Consistency: Strong for file metadata, eventual for content
- Scale: 500M users, 10M concurrent
Dropbox's key insight: Block-level sync. Instead of syncing entire files, Dropbox splits files into 4MB blocks and only syncs changed blocks. This reduces bandwidth by 10-100x.
Back-of-the-Envelope Estimation
Storage per User
Here,
- =Average files per user
- =Average file size
Total Storage Estimation
- 500M users x 2 GB = 1 EB (exabyte)
- With 3x replication: 3 EB
- With deduplication (40% savings): 1.8 EB
High-Level Architecture
File Chunking and Deduplication
DfContent-Addressable Storage
Dropbox splits files into 4MB blocks and hashes each block using SHA-256. The hash becomes the block's identifier. If two users have the same block, it's stored once (deduplication). This reduces storage by 40-60%.
Block Hash
Here,
- =4MB chunk of file data
- =Unique content hash
Dropbox uses a Merkle tree to efficiently detect which blocks changed. The tree hash of a file changes only when a block changes, enabling O(log n) change detection.
Synchronization Protocol
Conflict Resolution
DfLast-Writer-Wins with Manual Override
Dropbox uses last-writer-wins (LWW) for non-conflicting changes. When two users edit the same block simultaneously, Dropbox creates a conflict copy (filename.conflict). Users manually merge conflicts.
Conflict Detection
Here,
- =Device A's version
- =Device B's version
- =Common ancestor version
Data Model
File Metadata Schema
Here,
- =Unique file identifier
- =Ordered list of block hashes
- =Monotonically increasing version
Practice Exercises
- Sync Design: How would you handle syncing a 10GB file where only 1 byte changed?
- Conflict: Design a conflict resolution system for collaborative document editing.
- Offline: How would you handle a user editing files offline for a week, then reconnecting?
- Storage: Design a deduplication system for 1 EB of user files.
Key Takeaways:
- Block-level sync with content-addressable storage enables deduplication
- Merkle trees enable efficient change detection
- LWW with conflict copies for concurrent edits
- WebSocket push for real-time sync notifications
- 4MB block size balances granularity vs overhead
What to Learn Next
-> Design Google Search Web-scale indexing and search.
-> Design WhatsApp Messaging and real-time delivery.
-> Idempotency Handling duplicate requests safely.
-> Outbox Pattern Reliable event publishing.
-> Strangler Fig Incremental migration strategies.
-> Retry Patterns Resilient retry mechanisms.