πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Design Google Drive

System Design ProblemsFile Sync & Storage🟒 Free Lesson

Advertisement

System Design Problems

Design Google Drive

Google Drive enables users to store files in the cloud, sync across devices, and share with collaborators. The system must handle file uploads (up to 5 TB), real-time sync, conflict resolution when multiple users edit simultaneously, and efficient delta sync for large files.

  • File Sync β€” Changes propagate across devices within seconds
  • Conflict Resolution β€” Handle concurrent edits gracefully
  • Sharing β€” Granular permissions (view, comment, edit) per file/folder

The fundamental challenge is synchronization: when two users edit the same file simultaneously, the system must resolve conflicts without losing data.

Requirements

Functional Requirements

  • Upload and download files (up to 5 TB)
  • Real-time sync across multiple devices
  • File versioning and history
  • Conflict resolution for concurrent edits
  • Sharing with permissions (view, comment, edit)
  • Folder hierarchy and organization
  • Offline editing with sync on reconnect

Non-Functional Requirements

  • Latency: Sync changes within 5 seconds
  • Durability: 99.999999999% (11 nines)
  • Availability: 99.99%
  • Scale: 1 billion files, 500M active users
  • Bandwidth: Efficient delta sync for large files

Google Drive uses delta sync (rsync-like): only transfer the changed parts of a file, not the entire file. This reduces bandwidth for large files (e.g., video editing projects).

Back-of-the-Envelope Estimation

Google Drive Capacity

  • 1 billion files Γ— 10 MB average = 10 PB
  • 500M users Γ— 10 GB free = 5 EB (with deduplication: ~500 PB)
  • 100M sync operations/day = 1,160 QPS
  • 10M file uploads/day = 116 QPS
  • Storage growth: 50 PB/day

High-Level Architecture

DesktopClientMobileClientWebClientAPI GatewayDrive ServiceFile ManagerSync EngineDelta CalculatorConflict ResolverVersion ManagerSharing ServiceNotificationMetadata DB(PostgreSQL)Block Storage(S3/GCS)Notification(WebSocket)Google Drive Architecture

Detailed Design

File Chunking and Block Storage

DfBlock Storage

Large files are split into fixed-size blocks (e.g., 256 KB). Each block is stored independently and identified by a hash. Deduplication occurs at the block level: identical blocks across files are stored once.

Block Deduplication

storageactual=total_blocksΓ—block_sizededup_ratiostorage_{actual} = \frac{total\_blocks \times block\_size}{dedup\_ratio}

Here,

  • totalblockstotal_blocks=Total number of unique blocks
  • blocksizeblock_size=Size per block (typically 256 KB)
  • dedupratiodedup_ratio=Deduplication ratio (typically 2-5x)

Block Deduplication Savings

User A uploads "report.pdf" (100 pages = 50 MB) User B uploads "report.pdf" (same file) β†’ 0 additional storage (dedup) User C uploads "report_v2.pdf" (changed 2 pages) β†’ only 2 MB new blocks

With 256 KB blocks: 50 MB / 256 KB = 200 blocks Changed 2 pages: ~8 blocks new = 2 MB additional storage

Delta Sync

DfDelta Sync

Delta sync transfers only the changed blocks of a file, not the entire file. The client computes a block-level diff and uploads only new/modified blocks. This reduces bandwidth for large files.

ClientBlock 1 βœ“Block 2 βœ— NEWBlock 3 βœ“Upload OnlyChanged Blocks(256 KB)ServerDelta sync: upload only changed blocks

Use rolling hash (e.g., Rabin fingerprint) to detect block boundaries in a stream. This allows efficient diff computation without knowing block boundaries in advance.

Conflict Resolution

DfOperational Transformation

Operational Transformation (OT) transforms concurrent operations to maintain consistency. When two users edit the same document, operations are transformed so they produce the same result regardless of execution order.

Google Drive uses a combination of strategies:

  • Files: Create conflict copies (both versions preserved)
  • Google Docs: Operational Transformation (real-time collaboration)
  • Binary files: Last-write-wins with conflict copies

For binary files (PDF, images):

  1. User A edits file offline
  2. User B edits file online
  3. User A reconnects and syncs
  4. System detects conflict (version mismatch)
  5. Creates "conflict copy" for User A
  6. User B's version becomes the current version
  7. User manually merges if needed

File Versioning

DfFile Versioning

Every file change creates a new version. Previous versions are retained and accessible. Version history enables rollback and audit trails.

Architecture Diagram
File versions:
v1: block_1, block_2, block_3 (created 2026-06-18)
v2: block_1, block_4, block_3 (created 2026-06-19, block_2 changed to block_4)
v3: block_5, block_4, block_3 (created 2026-06-20, block_1 changed to block_5)

Store only unique blocks across versions. Version metadata records which blocks compose each version. Rollback reconstructs the file from historical block references.

Sharing and Permissions

DfACL (Access Control List)

Each file/folder has an ACL listing users and their permissions. Permissions are hierarchical: folder permissions inherit to children unless explicitly overridden.

Architecture Diagram
File ACL:
{
  "file_id": "file_123",
  "owner": "user_a@example.com",
  "permissions": [
    { "user": "user_b@example.com", "role": "editor" },
    { "user": "user_c@example.com", "role": "viewer" },
    { "group": "team@example.com", "role": "commenter" }
  ]
}

Practice Exercises

  1. Design: How would you implement real-time collaborative editing for Google Docs (not just binary files)? Design the Operational Transformation system.

  2. Scale: If 500M users each have 10 GB of files with a deduplication ratio of 3x, estimate the total storage and the block storage architecture.

  3. Sync: Design a delta sync algorithm that efficiently computes the diff between two versions of a 10 GB video file.

  4. Offline: How would you handle offline editing where the user edits a file on two devices without internet? Design the conflict detection and resolution strategy.

Key Takeaways:

  • Block-level storage (256 KB blocks) enables deduplication and delta sync
  • Delta sync transfers only changed blocks, reducing bandwidth by 90%+ for large files
  • Conflict resolution: binary files β†’ conflict copies; Docs β†’ Operational Transformation
  • File versioning stores only unique blocks across versions for space efficiency
  • ACL-based sharing with hierarchical permissions (folder β†’ file inheritance)

What to Learn Next

-> Design Google Docs Real-time collaborative editing with OT/CRDT.

-> Design Object Storage Block storage and deduplication at scale.

-> Data Replication Replicating file data across data centers.

-> Design Chat System Real-time sync with WebSocket and conflict resolution.

-> Caching Strategies Caching frequently accessed files at the edge.

-> Security Patterns File encryption and access control.

⭐

Premium Content

Design Google Drive

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement