🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Design an Object Storage System

System Design ProblemsBlob Storage🟢 Free Lesson

Advertisement

System Design Problems

Design an Object Storage System

Object storage systems like Amazon S3 store trillions of objects with extreme durability (11 nines) and availability. Unlike file systems, object storage provides a flat namespace with a simple REST API for storing and retrieving arbitrary data.

  • Flat Namespace — Objects identified by unique keys in a bucket
  • Durability — 99.999999999% (11 nines) durability through replication
  • Massive Scale — Trillions of objects, exabytes of data

Object storage trades the hierarchical structure of file systems for massive scalability and durability. Every object is immutable—you update by writing a new version.

Requirements

Functional Requirements

  • Store and retrieve objects (up to 5 TB per object)
  • Support bucket-based organization
  • RESTful API: PUT, GET, DELETE objects
  • Versioning and lifecycle management
  • Access control lists (ACLs) and bucket policies
  • Multipart upload for large objects
  • Event notifications on object changes

Non-Functional Requirements

  • Durability: 99.999999999% (11 nines)
  • Availability: 99.99% for standard tier
  • Scale: 10 trillion objects, 100 PB per region
  • Latency: First byte < 100ms for GET
  • Throughput: 5,500 GET requests/second per prefix

Durability of 11 nines means that if you store 10 million objects for 10 million years, you expect to lose approximately 1 object. This requires aggressive replication and integrity checking.

Back-of-the-Envelope Estimation

Object Storage Capacity

  • 10 trillion objects × 1 KB average = 10 PB minimum
  • With 3x replication: 30 PB raw storage
  • 1000 storage nodes × 100 TB each = 100 PB capacity
  • 5,500 GET/sec/prefix × 1000 prefixes = 5.5M GET/sec
  • Metadata: 10 trillion × 500 bytes = 5 PB (distributed)

API Design

Architecture Diagram
PUT /{bucket}/{key}
Content-Type: image/jpeg
Body: <binary data>
Response: { "etag": "d41d8cd98f00b204e9800998ecf8427e" }

GET /{bucket}/{key}
Response: <binary data>
Headers: Content-Length, ETag, Last-Modified

DELETE /{bucket}/{key}
Response: 204 No Content

POST /{bucket}/{key}?uploads
Response: { "upload_id": "upload_123" }

PUT /{bucket}/{key}?uploadId=upload_123&partNumber=1
Response: { "etag": "part_etag" }

POST /{bucket}/{key}?uploadId=upload_123
Body: { "parts": [...] }

High-Level Architecture

ClientLoad BalancerMetadata ServiceObject IndexBucket PoliciesData ServiceChunk ManagerReplicationMetadata DB(Distributed)Storage NodesNode 1 (100 TB)Node 2 (100 TB)Node N (100 TB)Object Storage Architecture

Detailed Design

Data Model

DfObject Storage Data Model

An object consists of data (the blob) and metadata (key, size, checksum, custom headers). Objects are organized into buckets with a flat key namespace within each bucket.

Architecture Diagram
// Object Metadata
{
  bucket: "images",
  key: "photos/2026/06/photo1.jpg",
  size: 2048576,           // 2 MB
  content_type: "image/jpeg",
  etag: "d41d8cd98f00b204",
  created_at: "2026-06-20T10:00:00Z",
  storage_class: "STANDARD",
  version_id: "v1",
  checksum: "sha256:abc123...",
  parts: [                 // For multipart uploads
    { part_num: 1, offset: 0, size: 5242880, etag: "..." },
    { part_num: 2, offset: 5242880, size: 5242880, etag: "..." }
  ]
}

Data Chunking and Placement

Large objects are split into chunks for efficient storage and replication:

DfObject Chunking

Large objects are split into fixed-size chunks (e.g., 64 MB). Each chunk is independently replicated across storage nodes. This enables parallel uploads, efficient replication, and partial reads.

Chunk Count

chunks=object_sizechunk_sizechunks = \lceil \frac{object\_size}{chunk\_size} \rceil

Here,

  • objectsizeobject_size=Total object size in bytes
  • chunksizechunk_size=Chunk size (typically 64 MB)

Chunk Calculation

For a 1 GB video file with 64 MB chunks: chunks = ⌈1024 MB / 64 MB⌉ = 16 chunks

Each chunk is 64 MB, replicated 3× = 192 MB total storage.

Replication Strategy

DfErasure Coding

Erasure coding splits data into k chunks and generates m parity chunks, allowing recovery from any m chunk failures. More storage-efficient than full replication.

StrategyStorage OverheadDurabilityRead Performance
3x Replication300%HighFast (any replica)
Reed-Solomon (10+4)140%Very HighModerate (decode)
Reed-Solomon (10+2)120%HighModerate

Erasure Coding Overhead

overhead=k+mkoverhead = \frac{k + m}{k}

Here,

  • kk=Data chunks
  • mm=Parity chunks

Erasure Coding vs Replication

For 100 MB object:

3x Replication: 300 MB storage, can tolerate 2 node failures RS(10,4): 140 MB storage, can tolerate 4 node failures

RS is more storage-efficient with higher fault tolerance.

Metadata Architecture

The metadata service is the control plane:

Metadata operations (PUT/GET object metadata) are separate from data operations (PUT/GET object data). This separation allows metadata to scale independently and be cached aggressively.

Metadata TierTechnologyUse Case
HotRedisFrequently accessed metadata
WarmCassandraRecent objects, time-series access
ColdS3/Object StoreArchived metadata, audit logs

Multipart Upload

For large objects, multipart upload enables:

  • Parallel uploads of chunks
  • Resumable uploads on failure
  • Upload of objects larger than 5 GB
Init UploadPart 1Part 2Part NCompleteFinal Object

Practice Exercises

  1. Design: How would you implement object versioning? What are the storage implications of keeping all versions vs. lifecycle policies?

  2. Durability: If you use 3x replication across 3 data centers, calculate the probability of data loss given a 0.1% annual disk failure rate per node.

  3. Scale: Design a system to handle 5,500 GET requests per second per prefix. How would you distribute load across storage nodes?

  4. Optimization: How would you implement a CDN cache invalidation system for objects stored in S3? Design for both instant and eventual consistency.

Key Takeaways:

  • Object storage uses a flat namespace with bucket organization for massive scalability
  • Erasure coding (RS 10+4) is more storage-efficient than 3x replication with higher durability
  • Separating metadata from data services allows independent scaling
  • Multipart upload enables parallel chunk uploads for large objects
  • Object immutability simplifies consistency but requires versioning for updates

What to Learn Next

-> Design Pastebin Storing large text objects with S3-style storage.

-> Databases Distributed metadata storage with Cassandra and PostgreSQL.

-> CDNs Caching objects at the edge for low-latency access.

-> Data Replication Replication strategies for durability and availability.

-> Design Google Drive File sync and storage with conflict resolution.

-> Consistent Hashing Distributing objects across storage nodes.

Premium Content

Design an Object Storage System

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement