System Design Problems
Design YouTube
YouTube serves 2B+ monthly active users with 500+ hours of video uploaded every minute. This design covers video upload pipelines, adaptive streaming, CDN delivery, and recommendation systems.
- Scale — 2B MAU, 500+ hours uploaded/minute, 1B hours watched/day
- Storage — Exabytes of video content
- Streaming — Adaptive bitrate with sub-second startup
YouTube's core challenge is the asymmetric nature: massive storage for uploads, massive bandwidth for streaming.
Requirements Clarification
Functional Requirements
- Upload videos with metadata
- Stream videos with adaptive quality
- Search videos by title/tags
- Like, comment, subscribe
- View personalized recommendations
- Live streaming
- Short-form content (Shorts)
Non-Functional Requirements
- Availability: 99.99% uptime
- Latency: Video starts in < 2 seconds
- Durability: Videos must never be lost
- Consistency: Eventual consistency for counts
- Scale: 1B hours of video watched daily
YouTube's key insight: Video transcoding is the bottleneck. A 1-hour 4K video can take 4+ hours to transcode. The system must process millions of videos in parallel.
Back-of-the-Envelope Estimation
Upload Rate
Here,
- =Hours uploaded per minute
- =Videos per second (avg 1 min each)
Storage Estimation
Per video storage:
- Original: 10GB (1-hour 4K)
- Transcoded variants: 10GB × 6 resolutions = 60GB
- With 3 replicas: 60GB × 3 = 180GB per video
Daily storage (86,400 videos/day): 180GB × 86,400 = 15.5 PB/day
Annual storage: 15.5 PB × 365 ≈ 5.7 EB/year
High-Level Architecture
Video Upload Pipeline
Transcoding Pipeline
DfAdaptive Bitrate Streaming
Videos are transcoded into multiple resolutions (240p to 8K) and bitrates. The player dynamically switches quality based on network conditions. This ensures smooth playback across varying bandwidth.
Transcoding Cost
Here,
- =Encoding complexity factor (1x-10x)
- =Number of transcoding workers
Transcoding Parallelism
A 1-hour 4K video at 10x complexity = 10 hours of CPU time.
With 100 parallel workers: 10 hours / 100 = 6 minutes
To process 86,400 videos/day (1 per second average): Need 86,400 × 10 / 86400 = 10,000 CPU cores
Adaptive Streaming
ABR Decision
Here,
- =Available quality levels
- =Estimated available bandwidth
HLS vs DASH
| Feature | HLS | DASH |
|---|---|---|
| Protocol | Apple's HTTP Live Streaming | Dynamic Adaptive Streaming |
| Segment Format | MPEG-TS | fMP4/CMAF |
| Manifest | m3u8 | MPD |
| DRM | FairPlay | Widevine/PlayReady |
| Latency | 6-30 seconds | 2-10 seconds |
CDN Architecture
DfMulti-CDN Strategy
YouTube uses a multi-CDN approach with Google's own CDN (GGC) deployed inside ISPs. Popular content is cached at edge nodes close to users. Long-tail content is served from regional data centers.
CDN Cache Hit Ratio
Here,
- =Fraction of requests served from edge
- => 95% for popular content
Recommendation System
DfCollaborative Filtering + Deep Learning
YouTube's recommendation system uses a two-stage approach:
- Candidate Generation: Retrieve hundreds of candidate videos using collaborative filtering
- Ranking: Score candidates using a deep neural network
Recommendation Score
Here,
- =Watch history, likes, search queries
- =Title, tags, thumbnail, duration
- =Time of day, device, location
Data Model
Video Schema
Here,
- =Unique video identifier
- =processing/ready/failed
- =Video duration in seconds
Practice Exercises
-
Upload Design: Design a resumable upload system that handles 10GB files. How do you handle network failures mid-upload?
-
Transcoding: How would you prioritize transcoding? Should a viral video be transcoded before an old personal video?
-
Live Streaming: Design a live streaming system with < 5 second latency. How does this differ from VOD?
-
Storage Optimization: Propose a storage tiering strategy for videos with decreasing popularity over time.
Key Takeaways:
- Video upload uses chunked upload with resumable support
- Transcoding is the bottleneck: requires massive parallelism
- Adaptive bitrate streaming ensures smooth playback across networks
- Multi-CDN with edge caching for low-latency delivery
- Recommendation system uses two-stage ML pipeline
What to Learn Next
-> Design Netflix Content delivery and adaptive streaming.
-> Design Instagram Media delivery and feed generation.
-> Design Amazon E-commerce at scale.
-> Design Google Search Web-scale indexing and ranking.
-> Back Pressure Managing load in streaming systems.
-> Caching Strategies CDN and edge caching patterns.