System Design Problems
Design Google Maps
Google Maps provides turn-by-turn navigation, real-time traffic, and points of interest for over 1 billion users. The system must compute optimal routes across a graph with 100M+ nodes in milliseconds using algorithms like Dijkstra's and contraction hierarchies.
- Route Computation — Find shortest/fastest path in a massive road network
- Real-time Traffic — Incorporate live traffic data into routing
- Map Rendering — Serve vector map tiles at multiple zoom levels
The road network is a graph with 100M+ nodes and 200M+ edges. Computing a route from New York to Los Angeles requires exploring billions of possible paths—but smart algorithms make this possible in milliseconds.
Requirements
Functional Requirements
- Get directions between two points (driving, walking, transit, cycling)
- Real-time traffic conditions and ETA
- Turn-by-turn navigation
- Search for points of interest (restaurants, gas stations)
- Multiple route alternatives with estimated time
- Offline maps for areas without connectivity
Non-Functional Requirements
- Latency: Route computed in < 500ms
- Scale: 1 billion map requests/day, 100M navigation sessions
- Accuracy: ETA within 10% of actual travel time
- Availability: 99.99% uptime
- Freshness: Traffic updates every 2 minutes
Route computation is a shortest-path problem on a weighted graph. Edge weights represent distance, travel time, or a combination. Dijkstra's algorithm is O(V²), too slow for 100M+ nodes—contraction hierarchies precompute shortcuts.
Back-of-the-Envelope Estimation
Maps System Capacity
- 1B map requests/day = 11,600 QPS
- 100M navigation sessions/day = 1,160 QPS
- Road network: 100M nodes, 200M edges
- Graph storage: 200M edges × 50 bytes = 10 GB
- Map tiles: 20 zoom levels × 256×256 px = ~1 TB pre-rendered
- Traffic data: 100M GPS points/minute × 100 bytes = 10 GB/min
Graph Algorithms
DfRoad Network Graph
The road network is a weighted directed graph where nodes are intersections and edges are road segments. Edge weights represent travel time (adjusted for speed limits, traffic, and turn penalties).
Dijkstra's Algorithm
Here,
- =Shortest known distance to node v
- =Shortest known distance to node u
- =Weight (travel time) of edge from u to v
Dijkstra's Limitation
Dijkstra's explores all nodes in order of distance. For 100M nodes:
Worst case: O(V²) = (10⁸)² = 10¹⁶ operations At 1 billion operations/second: 10⁷ seconds = 115 days
Too slow for real-time routing. Need preprocessing (contraction hierarchies).
Contraction Hierarchies
DfContraction Hierarchies
Contraction hierarchies precompute "shortcut" edges for important nodes (highways). At query time, the algorithm only explores nodes near the source and destination, then uses shortcuts for the middle of the route. This reduces query time from minutes to milliseconds.
Contraction hierarchies reduce routing time from seconds to < 1ms for continent-scale graphs. The preprocessing step (computing shortcuts) takes hours but only needs to be done once when the graph changes.
Real-time Traffic
Incorporate GPS data from phones to estimate traffic speeds:
Traffic-Adjusted Weight
Here,
- =Length of road segment e
- =Free-flow speed (speed limit)
- =Multiplier from real-time GPS data (0.5 = half speed)
Traffic Impact
Road segment: 2 km, free-flow speed: 60 km/h Free-flow travel time: 2/60 × 60 = 2 minutes
With traffic (factor = 0.5, congestion): Adjusted time: 2 / (60 × 0.5) × 60 = 4 minutes
The route through this segment now takes twice as long.
High-Level Architecture
Map Tile System
DfSlippy Map Tiles
The world is divided into 256×256 pixel tiles at 20+ zoom levels. Each tile is identified by (zoom, x, y) coordinates. Tiles are cached aggressively at CDN edge locations.
Tile Coordinates
Here,
- =Zoom level (0 = world view, 20 = street level)
Tile Calculation
Zoom level 10: 1024 × 1024 = 1M tiles total Zoom level 15: 32768 × 32768 = 1B tiles total
Only render and cache tiles that users actually request (sparse coverage).
Practice Exercises
-
Algorithm: Implement Dijkstra's algorithm for a small graph (10 nodes). What is the time complexity with and without a priority queue?
-
Scale: If the road network has 100M nodes and 200M edges, estimate the memory needed for the contraction hierarchy preprocessing.
-
Real-time: Design a system to incorporate real-time traffic data from 10M GPS sources every 2 minutes. What are the latency and throughput requirements?
-
Offline: How would you implement offline maps that allow routing without internet connectivity? What data would you pre-download?
Key Takeaways:
- Road networks are weighted graphs; Dijkstra's is too slow—use contraction hierarchies
- Contraction hierarchies precompute shortcut edges, reducing query time from seconds to < 1ms
- Real-time traffic adjusts edge weights using GPS data from mobile devices
- Slippy map tiles enable efficient map rendering with aggressive CDN caching
- Preprocessing (contraction) takes hours but only needs to run when the graph changes
What to Learn Next
-> Design Proximity Service Finding nearby businesses and points of interest.
-> Design Search Autocomplete Location-based typeahead suggestions.
-> CDNs Serving map tiles from edge locations.
-> Design Realtime Analytics Real-time traffic data processing.
-> Caching Strategies Caching map tiles and route computations.
-> Databases Geospatial indexes and tile storage.