🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Design Dropbox

System Design ProblemsFile Storage Systems🟢 Free Lesson

Advertisement

System Design Problems

Design Dropbox

Dropbox serves 500M+ users with 2B+ files synced daily. This design covers file chunking, real-time synchronization, conflict resolution, and collaborative editing.

  • Scale — 500M+ users, 2B+ files, 10M+ concurrent
  • Sync — Real-time with < 1 second propagation
  • Storage — Exabytes of user files with deduplication

Dropbox's core challenge is maintaining file consistency across millions of devices while handling conflicts gracefully.

Requirements Clarification

Functional Requirements

  1. Upload and download files
  2. Automatic file synchronization across devices
  3. File versioning and history
  4. Share files and folders with others
  5. Offline access with sync on reconnect
  6. Conflict detection and resolution

Non-Functional Requirements

  1. Availability: 99.99% uptime
  2. Durability: 99.999999999% (11 nines)
  3. Consistency: Strong for file metadata, eventual for content
  4. Scale: 500M users, 10M concurrent

Dropbox's key insight: Block-level sync. Instead of syncing entire files, Dropbox splits files into 4MB blocks and only syncs changed blocks. This reduces bandwidth by 10-100x.

Back-of-the-Envelope Estimation

Storage per User

Storage=2000 files×1 MB avg=2 GB/user\text{Storage} = 2000 \text{ files} \times 1 \text{ MB avg} = 2 \text{ GB/user}

Here,

  • 20002000=Average files per user
  • 1MB1 MB=Average file size

Total Storage Estimation

  • 500M users x 2 GB = 1 EB (exabyte)
  • With 3x replication: 3 EB
  • With deduplication (40% savings): 1.8 EB

High-Level Architecture

Dropbox ClientAPI Gateway / Load BalancerSync SvcMetadata SvcChunk SvcStorage SvcShare SvcNotification SvcMessage Queue (Kafka)MySQL (Metadata)S3 (Chunks)Redis (Sync)MongoDB (Shares)WebSocket (Push)

File Chunking and Deduplication

DfContent-Addressable Storage

Dropbox splits files into 4MB blocks and hashes each block using SHA-256. The hash becomes the block's identifier. If two users have the same block, it's stored once (deduplication). This reduces storage by 40-60%.

Block Hash

block_id=SHA-256(block_data)\text{block\_id} = \text{SHA-256}(\text{block\_data})

Here,

  • blockdatablock_data=4MB chunk of file data
  • blockidblock_id=Unique content hash

Dropbox uses a Merkle tree to efficiently detect which blocks changed. The tree hash of a file changes only when a block changes, enabling O(log n) change detection.

Synchronization Protocol

Device ASync ServiceMetadata DBObject StoreDevice B

Conflict Resolution

DfLast-Writer-Wins with Manual Override

Dropbox uses last-writer-wins (LWW) for non-conflicting changes. When two users edit the same block simultaneously, Dropbox creates a conflict copy (filename.conflict). Users manually merge conflicts.

Conflict Detection

Conflict=(versionAparent)(versionBparent)\text{Conflict} = (\text{version}_A \neq \text{parent}) \wedge (\text{version}_B \neq \text{parent})

Here,

  • versionAversion_A=Device A's version
  • versionBversion_B=Device B's version
  • parentparent=Common ancestor version

Data Model

File Metadata Schema

File=(file_id,name,path,size,blocks[],version,modified_at)\text{File} = (file\_id, name, path, size, blocks[], version, modified\_at)

Here,

  • fileidfile_id=Unique file identifier
  • blocks[]blocks[]=Ordered list of block hashes
  • versionversion=Monotonically increasing version

Practice Exercises

  1. Sync Design: How would you handle syncing a 10GB file where only 1 byte changed?
  2. Conflict: Design a conflict resolution system for collaborative document editing.
  3. Offline: How would you handle a user editing files offline for a week, then reconnecting?
  4. Storage: Design a deduplication system for 1 EB of user files.

Key Takeaways:

  • Block-level sync with content-addressable storage enables deduplication
  • Merkle trees enable efficient change detection
  • LWW with conflict copies for concurrent edits
  • WebSocket push for real-time sync notifications
  • 4MB block size balances granularity vs overhead

What to Learn Next

-> Design Google Search Web-scale indexing and search.

-> Design WhatsApp Messaging and real-time delivery.

-> Idempotency Handling duplicate requests safely.

-> Outbox Pattern Reliable event publishing.

-> Strangler Fig Incremental migration strategies.

-> Retry Patterns Resilient retry mechanisms.

Premium Content

Design Dropbox

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement