System Design Problems

Design an Object Storage System

Object storage systems like Amazon S3 store trillions of objects with extreme durability (11 nines) and availability. Unlike file systems, object storage provides a flat namespace with a simple REST API for storing and retrieving arbitrary data.

Flat Namespace — Objects identified by unique keys in a bucket
Durability — 99.999999999% (11 nines) durability through replication
Massive Scale — Trillions of objects, exabytes of data

Object storage trades the hierarchical structure of file systems for massive scalability and durability. Every object is immutable—you update by writing a new version.

Requirements

Functional Requirements

Store and retrieve objects (up to 5 TB per object)
Support bucket-based organization
RESTful API: PUT, GET, DELETE objects
Versioning and lifecycle management
Access control lists (ACLs) and bucket policies
Multipart upload for large objects
Event notifications on object changes

Non-Functional Requirements

Durability: 99.999999999% (11 nines)
Availability: 99.99% for standard tier
Scale: 10 trillion objects, 100 PB per region
Latency: First byte < 100ms for GET
Throughput: 5,500 GET requests/second per prefix

Durability of 11 nines means that if you store 10 million objects for 10 million years, you expect to lose approximately 1 object. This requires aggressive replication and integrity checking.

Back-of-the-Envelope Estimation

Object Storage Capacity

10 trillion objects × 1 KB average = 10 PB minimum
With 3x replication: 30 PB raw storage
1000 storage nodes × 100 TB each = 100 PB capacity
5,500 GET/sec/prefix × 1000 prefixes = 5.5M GET/sec
Metadata: 10 trillion × 500 bytes = 5 PB (distributed)

API Design

Architecture Diagram

PUT /{bucket}/{key}
Content-Type: image/jpeg
Body: <binary data>
Response: { "etag": "d41d8cd98f00b204e9800998ecf8427e" }

GET /{bucket}/{key}
Response: <binary data>
Headers: Content-Length, ETag, Last-Modified

DELETE /{bucket}/{key}
Response: 204 No Content

POST /{bucket}/{key}?uploads
Response: { "upload_id": "upload_123" }

PUT /{bucket}/{key}?uploadId=upload_123&partNumber=1
Response: { "etag": "part_etag" }

POST /{bucket}/{key}?uploadId=upload_123
Body: { "parts": [...] }

High-Level Architecture

Detailed Design

Data Model

DfObject Storage Data Model

An object consists of data (the blob) and metadata (key, size, checksum, custom headers). Objects are organized into buckets with a flat key namespace within each bucket.

Architecture Diagram

// Object Metadata
{
  bucket: "images",
  key: "photos/2026/06/photo1.jpg",
  size: 2048576,           // 2 MB
  content_type: "image/jpeg",
  etag: "d41d8cd98f00b204",
  created_at: "2026-06-20T10:00:00Z",
  storage_class: "STANDARD",
  version_id: "v1",
  checksum: "sha256:abc123...",
  parts: [                 // For multipart uploads
    { part_num: 1, offset: 0, size: 5242880, etag: "..." },
    { part_num: 2, offset: 5242880, size: 5242880, etag: "..." }
  ]
}

Data Chunking and Placement

Large objects are split into chunks for efficient storage and replication:

DfObject Chunking

Large objects are split into fixed-size chunks (e.g., 64 MB). Each chunk is independently replicated across storage nodes. This enables parallel uploads, efficient replication, and partial reads.

Chunk Count

chunks = \lceil \frac{object\_size}{chunk\_size} \rceil

Here,

$object_size$ =Total object size in bytes
$chunk_size$ =Chunk size (typically 64 MB)

Chunk Calculation

For a 1 GB video file with 64 MB chunks: chunks = ⌈1024 MB / 64 MB⌉ = 16 chunks

Each chunk is 64 MB, replicated 3× = 192 MB total storage.

Replication Strategy

DfErasure Coding

Erasure coding splits data into k chunks and generates m parity chunks, allowing recovery from any m chunk failures. More storage-efficient than full replication.

Strategy	Storage Overhead	Durability	Read Performance
3x Replication	300%	High	Fast (any replica)
Reed-Solomon (10+4)	140%	Very High	Moderate (decode)
Reed-Solomon (10+2)	120%	High	Moderate

Erasure Coding Overhead

overhead = \frac{k + m}{k}

Here,

$k$ =Data chunks
$m$ =Parity chunks

Erasure Coding vs Replication

For 100 MB object:

3x Replication: 300 MB storage, can tolerate 2 node failures RS(10,4): 140 MB storage, can tolerate 4 node failures

RS is more storage-efficient with higher fault tolerance.

Metadata Architecture

The metadata service is the control plane:

Metadata operations (PUT/GET object metadata) are separate from data operations (PUT/GET object data). This separation allows metadata to scale independently and be cached aggressively.

Metadata Tier	Technology	Use Case
Hot	Redis	Frequently accessed metadata
Warm	Cassandra	Recent objects, time-series access
Cold	S3/Object Store	Archived metadata, audit logs

Multipart Upload

For large objects, multipart upload enables:

Parallel uploads of chunks
Resumable uploads on failure
Upload of objects larger than 5 GB

Practice Exercises

Design: How would you implement object versioning? What are the storage implications of keeping all versions vs. lifecycle policies?
Durability: If you use 3x replication across 3 data centers, calculate the probability of data loss given a 0.1% annual disk failure rate per node.
Scale: Design a system to handle 5,500 GET requests per second per prefix. How would you distribute load across storage nodes?
Optimization: How would you implement a CDN cache invalidation system for objects stored in S3? Design for both instant and eventual consistency.

Key Takeaways:

Object storage uses a flat namespace with bucket organization for massive scalability
Erasure coding (RS 10+4) is more storage-efficient than 3x replication with higher durability
Separating metadata from data services allows independent scaling
Multipart upload enables parallel chunk uploads for large objects
Object immutability simplifies consistency but requires versioning for updates

What to Learn Next

-> Design Pastebin Storing large text objects with S3-style storage.

-> Databases Distributed metadata storage with Cassandra and PostgreSQL.

-> CDNs Caching objects at the edge for low-latency access.

-> Data Replication Replication strategies for durability and availability.

-> Design Google Drive File sync and storage with conflict resolution.

-> Consistent Hashing Distributing objects across storage nodes.

Design an Object Storage System

Design an Object Storage System

Requirements

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Estimation

Object Storage Capacity

API Design

High-Level Architecture

Detailed Design

Data Model

DfObject Storage Data Model

Data Chunking and Placement

DfObject Chunking

Chunk Count

Chunk Calculation

Replication Strategy

DfErasure Coding

Erasure Coding Overhead

Erasure Coding vs Replication

Metadata Architecture

Multipart Upload

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?