Interview Prep
Estimation and Back-of-Envelope
Back-of-the-envelope estimation grounds your design in reality. Learn to quickly calculate storage, bandwidth, and compute requirements that drive architectural decisions.
- Quantification β Turn abstract problems into concrete numbers
- Approximation β Use reasonable assumptions to bound the problem
- Validation β Check if your numbers make sense against real-world systems
Numbers don't lieβbut wrong assumptions do.
The Fermi Estimation Method
Named after physicist Enrico Fermi, this method breaks complex problems into simpler, estimable components.
DfFermi Estimation
Fermi estimation is a technique for making good approximate calculations with little or no data. It involves breaking a complex problem into smaller, more manageable parts, estimating each part independently, then combining the estimates. The key insight is that reasonable estimates are often sufficient for design decisions, while exact precision is unnecessary.
The Estimation Framework
Follow these steps for any estimation problem:
- Clarify the goal β What exactly are we estimating?
- Break it down β Identify the component parts
- Estimate each part β Use known references and benchmarks
- Combine β Multiply or add the estimates
- Sanity check β Does the result make sense?
Essential Reference Numbers
Memorize these benchmarks for quick estimation:
| Category | Reference Values |
|---|---|
| Users | World population ~8B, internet users ~5B, US population ~330M |
| Data | 1 character = 1 byte, 1 page text β 2KB, 1 photo β 2-5MB, 1 video β 100MB-1GB |
| Time | 1 day = 86,400 sec, 1 month β 2.6M sec, 1 year β 31.5M sec |
| Numbers | 1K = 1,000, 1M = 1M, 1B = 1B, 1T = 1 trillion |
| Storage | 1TB = 1,000GB, 1PB = 1,000TB, 1EB = 1,000PB |
You don't need to memorize exact numbers. Approximate values (like 86,000 seconds in a day instead of 86,400) are perfectly acceptable and often preferred in interviews because they're easier to work with.
QPS Estimation
Queries per second (QPS) is the most fundamental metric:
Queries Per Second
Here,
- =Queries per second
- =Number of daily active users
- =Average requests per user per day
- =Seconds in a day
QPS for a Social Media Platform
Given: 500M daily active users, each makes 10 requests/day
Calculation: QPS = (500M Γ 10) / 86,400 β 57,870 QPS
Peak QPS (typically 2-3x average): Peak β 120,000 - 175,000 QPS
Read vs Write ratio (typically 10:1): Read QPS β 110,000 QPS Write QPS β 11,000 QPS
This tells you that a single database server (handling ~10K QPS) is insufficient. You need read replicas, caching, and possibly sharding.
Storage Estimation
Estimate how much data the system will store:
Storage Estimation
Here,
- =Total storage required
- =Write queries per second
- =Average record size in bytes
- =Data retention period in seconds
Storage for a Photo Sharing App
Given: 10M photos/day, 2MB each, 5-year retention
Daily storage: 10M Γ 2MB = 20TB/day Annual storage: 20TB Γ 365 = 7.3PB/year 5-year storage: 7.3PB Γ 5 = 36.5PB
With 3x replication: 36.5PB Γ 3 = 109.5PB
With compression (2x): 109.5PB / 2 β 55PB
This confirms you need a distributed file system (like S3 or HDFS), not a single storage array.
Bandwidth Estimation
Calculate network bandwidth requirements:
Bandwidth Estimation
Here,
- =Bandwidth in bytes per second
- =Queries per second
- =Average response size in bytes