System Design Problems
Design Amazon
Amazon serves 300M+ active users with 350M+ products. This design covers product catalog with search, shopping cart, checkout with inventory management, and order fulfillment.
- Scale β 300M+ users, 350M+ products, 66K orders/hour
- Availability β Must handle Prime Day spikes (10x normal)
- Inventory β Real-time stock tracking across 175+ fulfillment centers
Amazon's core challenge is building a resilient e-commerce platform that handles extreme traffic spikes while maintaining inventory accuracy.
Requirements Clarification
Functional Requirements
- Search products with filters and sorting
- View product details with reviews
- Add to cart and checkout
- Order tracking and management
- Inventory management
- Recommendation engine
- Seller marketplace
Non-Functional Requirements
- Availability: 99.99% uptime
- Latency: Search < 200ms, Checkout < 2s
- Consistency: Strong for inventory and payments
- Scale: 66K orders/hour, 10x on Prime Day
Amazon's key architectural principle: Design for failure. Every service must degrade gracefully. If the recommendation service is down, the catalog still works. If payment is slow, orders queue rather than fail.
Back-of-the-Envelope Estimation
Order Volume
Here,
- =Orders per hour
- =Orders per second
Storage Estimation
Product catalog:
- 350M products x 10KB metadata = 3.5 TB
- 350M products x 10 images x 500KB = 1.75 PB (object storage)
Order data:
- 1B orders/year x 5KB = 5 TB/year
User data:
- 300M users x 10KB = 3 TB
High-Level Architecture
Inventory Management
DfDistributed Inventory
Amazon tracks inventory across 175+ fulfillment centers. Each product can be stocked in multiple locations. The inventory service uses a reservation system: when a user adds to cart, inventory is temporarily reserved for 15 minutes.
Inventory Reservation
Here,
- =Physical stock in fulfillment center
- =In carts (15-min TTL)
- =Already shipped
Inventory accuracy is critical. Amazon uses a combination of: (1) Database reservations with TTL, (2) Idempotent checkout to prevent overselling, (3) Periodic reconciliation audits.
Checkout Flow: The Two-Phase Commit
DfSaga-Based Checkout
Amazon's checkout is a distributed transaction: (1) Reserve inventory, (2) Process payment, (3) Create order, (4) Confirm inventory. If any step fails, compensating transactions undo previous steps.
Product Search
DfA9 Search Algorithm
Amazon's A9 algorithm ranks products by: (1) Relevance (text match), (2) Popularity (sales velocity), (3) Availability (in-stock), (4) Price competitiveness. The ranking is personalized based on user history.
Search Ranking Score
Here,
- =TF-IDF text match score
- =Sales velocity score
- =In-stock probability
- =Price competitiveness
Data Model
Product Schema
Here,
- =ASIN (Amazon Standard ID)
- =Key-value product attributes
- =Hierarchical category path
Practice Exercises
- Inventory: Design a distributed inventory system that prevents overselling across 175 fulfillment centers.
- Cart: How would you handle a user adding an item to cart that goes out of stock before checkout?
- Search: Design a search system for 350M products with < 200ms latency.
- Prime Day: How would you handle 10x traffic on Prime Day without over-provisioning?
Key Takeaways:
- Amazon uses service-oriented architecture with graceful degradation
- Inventory management uses reservation with TTL to prevent overselling
- Checkout is a saga with compensating transactions for failure
- A9 search algorithm ranks by relevance, popularity, and availability
- Design for failure: every service must degrade independently
What to Learn Next
-> Design Dropbox File sync and storage systems.
-> Design Google Search Web-scale indexing and ranking.
-> Saga Pattern Distributed transactions.
-> Idempotency Handling duplicate requests safely.
-> Outbox Pattern Reliable event publishing.
-> Circuit Breaker Preventing cascade failures.