🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Design Pastebin

System Design ProblemsPaste Service🟢 Free Lesson

Advertisement

System Design Problems

Design Pastebin

Pastebin allows users to paste text or code and share it via a unique URL. Services like GitHub Gist, hastebin, and dpaste handle millions of pastes daily with simple creation and reading workflows.

  • Paste Creation — Accept text input, generate unique URL, store with metadata
  • Paste Reading — Retrieve paste content via short URL with minimal latency
  • Expiration — Pastes auto-delete after configurable TTL periods

Pastebin is simpler than most system design problems—it's essentially a write-once-read-many (WORM) system with time-based expiration.

Requirements

Functional Requirements

  • Users can create a paste with text content (up to 10MB)
  • Each paste gets a unique, shareable URL
  • Pastes can be public or private (unlisted)
  • Pastes expire after a configurable duration (10 min, 1 hour, 1 day, 1 week, never)
  • Support syntax highlighting for code pastes
  • Users can set a custom name for the paste (optional)

Non-Functional Requirements

  • Latency: Read paste in < 100ms
  • Availability: 99.9% uptime
  • Durability: Pastes must not be lost before expiration
  • Scalability: 10M new pastes/day, 100M reads/day

Pastebin has an extremely skewed read-to-write ratio (10:1 or higher). The write path can tolerate slightly higher latency since paste creation is not time-critical, but reads must be fast.

Back-of-the-Envelope Estimation

Storage and Traffic Estimation

Write path:

  • 10M pastes/day = ~115 QPS
  • Average paste size: 10 KB
  • Daily write storage: 10M × 10 KB = 100 GB/day
  • Annual storage: ~36.5 TB

Read path:

  • 100M reads/day = ~1,150 QPS
  • Average read size: 10 KB
  • Peak read QPS (3x): ~3,500 QPS

Cache strategy:

  • Hot pastes (top 20%): ~20M pastes × 10 KB = 200 GB cache

API Design

Architecture Diagram
POST /api/v1/pastes
Request:  { "content": "...", "syntax": "python", "visibility": "public", "expires_in": "1h" }
Response: { "paste_id": "abc123", "url": "https://paste.example.com/abc123" }

GET /{paste_id}
Response: { "content": "...", "syntax": "python", "created_at": "...", "expires_at": "..." }

GET /api/v1/pastes/{paste_id}/raw
Response: Plain text content (no JSON wrapping)

DELETE /api/v1/pastes/{paste_id}
Response: { "status": "deleted" }

High-Level Architecture

ClientLoadBalancerPaste ServiceExpiration SvcID GeneratorRedis CacheObject Storage(S3/GCS)Metadata DBPastebin Architecture

Detailed Design

Storage Layer

Pastebin has two distinct storage needs:

DfHot vs Cold Storage

Hot storage holds frequently accessed data in memory (Redis). Cold storage holds all data durably on disk or object storage. Pastebin uses a write-through pattern: writes go to both hot and cold storage, but reads try hot storage first.

Data TypeStorageReason
Paste content (large)Object Storage (S3)Cost-effective, durable, high throughput
Paste metadataRelational DB (PostgreSQL)Structured queries, relationships
Hot pastesRedisSub-millisecond reads, TTL support
Expired paste trackingRedis sorted setEfficient expiration scanning

Expiration Strategy

Pastebin requires automatic deletion of expired pastes:

Option A: Lazy Expiration (Recommended)

  • Check expires_at on every read
  • Delete if expired; return 404
  • Simple, no background processing
  • May show stale data in cache

Option B: Active Expiration

  • Background worker scans for expired pastes
  • Deletes from database and cache periodically
  • More consistent, but adds complexity
  • Use sorted set in Redis: EXPIRE_AT score=<timestamp>

Expiration Scan Rate

scan_rate=Nexpiredwindow_sizescan\_rate = \frac{N_{expired}}{window\_size}

Here,

  • NexpiredN_{expired}=Number of pastes expiring in the window
  • windowsizewindow_size=Scan interval in seconds

Expiration Processing Load

If 10M pastes/day expire on average:

Scan rate = 10M / 86400 ≈ 116 expired pastes/second

This is manageable with a single background worker using a Redis sorted set.

Content Storage Pattern

Store large paste content in object storage, metadata in the database:

Architecture Diagram
// Metadata record
{
  paste_id: "abc123",
  user_id: "user_456",
  content_path: "s3://pastes/ab/c1/abc123.txt",
  syntax: "python",
  visibility: "public",
  created_at: "2026-06-20T10:00:00Z",
  expires_at: "2026-06-20T11:00:00Z",
  size_bytes: 10240
}

Separating metadata from content allows you to cache and query metadata efficiently while keeping large content objects in cost-effective object storage.

Syntax Highlighting

Support syntax highlighting for code pastes:

  1. Client submits paste with syntax parameter (or auto-detect)
  2. Server stores raw content in object storage
  3. On read, apply syntax highlighting at the CDN edge or application layer
  4. Cache highlighted HTML alongside raw content

Use Prism.js or highlight.js for client-side rendering to avoid server-side highlighting overhead. Store the syntax hint with metadata so the client can load the appropriate language pack.

Scaling Considerations

Database Partitioning

Partition paste metadata by paste ID hash:

Architecture Diagram
shard = hash(paste_id) % NUM_SHARDS

This distributes pastes evenly across shards. For time-based queries (e.g., "recent pastes"), maintain a separate time-indexed table or use a time-series database.

Read Path Optimization

  1. Check Redis cache for paste metadata
  2. Cache hit: return content from object storage (cached at CDN)
  3. Cache miss: query database, populate cache, return content
  4. CDN caches public paste content at edge locations

Write Path Optimization

  1. Generate unique paste ID (Snowflake or random)
  2. Write metadata to database (async if possible)
  3. Write content to object storage (S3 multipart upload for large pastes)
  4. Populate Redis cache proactively
  5. Return paste URL to user

For pastes larger than 1MB, use multipart upload to object storage and return the URL immediately without waiting for upload completion. Notify the user when the paste is ready.


What to Learn Next

-> Design URL Shortener Similar WORM pattern with ID generation and caching strategies.

-> Caching Strategies Cache-aside, write-through, and TTL-based expiration patterns.

-> CDNs Caching static content at the edge for global low-latency access.

-> Databases Choosing between SQL and NoSQL for metadata storage.

-> Design Object Storage Building scalable blob storage for large content objects.

-> Design Unique ID Generator Snowflake IDs, UUIDs, and distributed ID generation.

Premium Content

Design Pastebin

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement