System Design Problems
Design a Proximity Service
A proximity service finds nearby points of interest (restaurants, gas stations, ATMs) based on a user's geographic location. The system must efficiently query millions of locations within a given radius using geospatial data structures.
- Nearby Search β Find all businesses within a radius
- Geospatial Indexing β Efficient range queries on lat/lng coordinates
- Real-time Updates β Business locations change; index must stay current
The core challenge is efficiently querying a 2D space. A naive approach checks every location (O(N)), but geospatial indexes reduce this to O(log N) or O(1) for bounded regions.
Requirements
Functional Requirements
- Search for businesses near a location (lat, lng, radius)
- Return results sorted by distance
- Business details (name, address, hours, rating)
- Search by category (restaurants, cafes, etc.)
- Add/update/remove business locations
- Filter by open hours, rating, price range
Non-Functional Requirements
- Latency: Search results in < 200ms
- Scale: 100M businesses, 10K QPS
- Accuracy: Results must be within specified radius
- Freshness: Location updates reflected within 1 minute
- Availability: 99.99%
Geospatial queries are fundamentally different from text search. The data is 2D (latitude/longitude), requiring special indexes like geohashing, quad-trees, or R-trees.
Back-of-the-Envelope Estimation
Proximity Service Capacity
- 100M businesses Γ 500 bytes = 50 GB metadata
- Search QPS: 10K (reads), 100 (writes)
- 1 km radius in NYC: ~1000 businesses
- 1 km radius in rural area: ~5 businesses
- Average search radius: 5 km
- Geohash precision (6 chars): ~1.2 km Γ 0.6 km cells
Geospatial Indexing
Geohashing
DfGeohash
A geohash encodes a 2D latitude/longitude coordinate into a 1D string of characters. Longer strings represent smaller areas. Locations that share a geohash prefix are geographically close.
Geohash Encoding
Here,
- =Binary representation of latitude
- =Binary representation of longitude
- =Alternating bits from lat and lng
Geohash Precision
Geohash "9q8yy" (5 chars): ~5 km Γ 5 km area Geohash "9q8yy9" (6 chars): ~1.2 km Γ 0.6 km area Geohash "9q8yy9u" (7 chars): ~153 m Γ 153 m area Geohash "9q8yy9u4" (8 chars): ~38 m Γ 19 m area
For nearby search, use 5-6 character geohashes to find candidate cells.
Query Algorithm
DfGeohash Range Query
To find nearby locations:
- Compute geohash of the search center
- Compute geohashes of 8 neighboring cells
- Query the database for all locations in these cells
- Filter by actual distance (Haversine formula)
- Return results within the radius, sorted by distance
Haversine Distance
Here,
- =Earth's radius (6371 km)
- =Latitude of points 1 and 2 (radians)
- =Difference in latitude
- =Difference in longitude
For very nearby points (< 1 km), use the simpler Pythagorean approximation: distance β β((Ξlat Γ 111)Β² + (Ξlng Γ 111 Γ cos(lat))Β²). This avoids expensive trig functions.
High-Level Architecture
Database Schema
CREATE TABLE businesses (
id UUID PRIMARY KEY,
name VARCHAR(200),
address TEXT,
latitude DECIMAL(10, 8),
longitude DECIMAL(11, 8),
geohash VARCHAR(12),
category VARCHAR(50),
rating DECIMAL(2, 1),
open_hours JSONB,
created_at TIMESTAMP
);
CREATE INDEX idx_geohash ON businesses(geohash);
CREATE INDEX idx_category ON businesses(category);
PostGIS provides native geospatial support in PostgreSQL. Use ST_DWithin for efficient radius queries and CREATE INDEX with GiST for geospatial indexing.
Caching Strategy
Cache popular geohash cells:
- Cache key:
geo:{geohash_6}:{category} - Cache value: List of business IDs sorted by rating
- TTL: 5 minutes (balance freshness vs. performance)
- Cache hit ratio target: > 80% for popular areas
Pre-compute and cache results for popular geohash cells (city centers, tourist areas). This reduces database load for the most frequent queries.
Practice Exercises
-
Algorithm: Implement geohash encoding from latitude/longitude. What is the precision at 6 characters?
-
Scale: If 100M businesses are distributed globally, estimate the Redis memory needed for geohash indexes at 6-character precision.
-
Optimization: How would you handle a search in a sparse area (no businesses within 5 km)? Design a strategy to expand the search radius dynamically.
-
Real-time: How would you update the geohash index when a business moves to a new location? Design an atomic update strategy.
Key Takeaways:
- Geohashing encodes 2D coordinates into 1D strings for efficient range queries
- Nearby search: query center cell + 8 neighbors, then filter by Haversine distance
- Geohash precision (6 chars) provides ~1.2 km cells for efficient neighborhood queries
- Redis geohash indexes enable sub-millisecond nearby lookups
- PostGIS provides native geospatial support for complex radius queries
What to Learn Next
-> Design Google Maps Routing, traffic, and map tile infrastructure.
-> Database Indexing B-tree, GiST, and geospatial index structures.
-> Caching Strategies Caching geohash cells and popular search results.
-> Design Search Autocomplete Location-based typeahead suggestions.
-> Databases PostGIS and geospatial database support.
-> Consistent Hashing Geohash-based data distribution.