System Design Problems
Design Pastebin
Pastebin allows users to paste text or code and share it via a unique URL. Services like GitHub Gist, hastebin, and dpaste handle millions of pastes daily with simple creation and reading workflows.
- Paste Creation — Accept text input, generate unique URL, store with metadata
- Paste Reading — Retrieve paste content via short URL with minimal latency
- Expiration — Pastes auto-delete after configurable TTL periods
Pastebin is simpler than most system design problems—it's essentially a write-once-read-many (WORM) system with time-based expiration.
Requirements
Functional Requirements
- Users can create a paste with text content (up to 10MB)
- Each paste gets a unique, shareable URL
- Pastes can be public or private (unlisted)
- Pastes expire after a configurable duration (10 min, 1 hour, 1 day, 1 week, never)
- Support syntax highlighting for code pastes
- Users can set a custom name for the paste (optional)
Non-Functional Requirements
- Latency: Read paste in < 100ms
- Availability: 99.9% uptime
- Durability: Pastes must not be lost before expiration
- Scalability: 10M new pastes/day, 100M reads/day
Pastebin has an extremely skewed read-to-write ratio (10:1 or higher). The write path can tolerate slightly higher latency since paste creation is not time-critical, but reads must be fast.
Back-of-the-Envelope Estimation
Storage and Traffic Estimation
Write path:
- 10M pastes/day = ~115 QPS
- Average paste size: 10 KB
- Daily write storage: 10M × 10 KB = 100 GB/day
- Annual storage: ~36.5 TB
Read path:
- 100M reads/day = ~1,150 QPS
- Average read size: 10 KB
- Peak read QPS (3x): ~3,500 QPS
Cache strategy:
- Hot pastes (top 20%): ~20M pastes × 10 KB = 200 GB cache
API Design
POST /api/v1/pastes
Request: { "content": "...", "syntax": "python", "visibility": "public", "expires_in": "1h" }
Response: { "paste_id": "abc123", "url": "https://paste.example.com/abc123" }
GET /{paste_id}
Response: { "content": "...", "syntax": "python", "created_at": "...", "expires_at": "..." }
GET /api/v1/pastes/{paste_id}/raw
Response: Plain text content (no JSON wrapping)
DELETE /api/v1/pastes/{paste_id}
Response: { "status": "deleted" }
High-Level Architecture
Detailed Design
Storage Layer
Pastebin has two distinct storage needs:
DfHot vs Cold Storage
Hot storage holds frequently accessed data in memory (Redis). Cold storage holds all data durably on disk or object storage. Pastebin uses a write-through pattern: writes go to both hot and cold storage, but reads try hot storage first.
| Data Type | Storage | Reason |
|---|---|---|
| Paste content (large) | Object Storage (S3) | Cost-effective, durable, high throughput |
| Paste metadata | Relational DB (PostgreSQL) | Structured queries, relationships |
| Hot pastes | Redis | Sub-millisecond reads, TTL support |
| Expired paste tracking | Redis sorted set | Efficient expiration scanning |
Expiration Strategy
Pastebin requires automatic deletion of expired pastes:
Option A: Lazy Expiration (Recommended)
- Check
expires_aton every read - Delete if expired; return 404
- Simple, no background processing
- May show stale data in cache
Option B: Active Expiration
- Background worker scans for expired pastes
- Deletes from database and cache periodically
- More consistent, but adds complexity
- Use sorted set in Redis:
EXPIRE_AT score=<timestamp>
Expiration Scan Rate
Here,
- =Number of pastes expiring in the window
- =Scan interval in seconds
Expiration Processing Load
If 10M pastes/day expire on average:
Scan rate = 10M / 86400 ≈ 116 expired pastes/second
This is manageable with a single background worker using a Redis sorted set.
Content Storage Pattern
Store large paste content in object storage, metadata in the database:
// Metadata record
{
paste_id: "abc123",
user_id: "user_456",
content_path: "s3://pastes/ab/c1/abc123.txt",
syntax: "python",
visibility: "public",
created_at: "2026-06-20T10:00:00Z",
expires_at: "2026-06-20T11:00:00Z",
size_bytes: 10240
}
Separating metadata from content allows you to cache and query metadata efficiently while keeping large content objects in cost-effective object storage.
Syntax Highlighting
Support syntax highlighting for code pastes:
- Client submits paste with
syntaxparameter (or auto-detect) - Server stores raw content in object storage
- On read, apply syntax highlighting at the CDN edge or application layer
- Cache highlighted HTML alongside raw content
Use Prism.js or highlight.js for client-side rendering to avoid server-side highlighting overhead. Store the syntax hint with metadata so the client can load the appropriate language pack.
Scaling Considerations
Database Partitioning
Partition paste metadata by paste ID hash:
shard = hash(paste_id) % NUM_SHARDS
This distributes pastes evenly across shards. For time-based queries (e.g., "recent pastes"), maintain a separate time-indexed table or use a time-series database.
Read Path Optimization
- Check Redis cache for paste metadata
- Cache hit: return content from object storage (cached at CDN)
- Cache miss: query database, populate cache, return content
- CDN caches public paste content at edge locations
Write Path Optimization
- Generate unique paste ID (Snowflake or random)
- Write metadata to database (async if possible)
- Write content to object storage (S3 multipart upload for large pastes)
- Populate Redis cache proactively
- Return paste URL to user
For pastes larger than 1MB, use multipart upload to object storage and return the URL immediately without waiting for upload completion. Notify the user when the paste is ready.
What to Learn Next
-> Design URL Shortener Similar WORM pattern with ID generation and caching strategies.
-> Caching Strategies Cache-aside, write-through, and TTL-based expiration patterns.
-> CDNs Caching static content at the edge for global low-latency access.
-> Databases Choosing between SQL and NoSQL for metadata storage.
-> Design Object Storage Building scalable blob storage for large content objects.
-> Design Unique ID Generator Snowflake IDs, UUIDs, and distributed ID generation.