System Design Problems

Design Typesense Autocomplete

Typesense provides typo-tolerant, instant search with sub-millisecond latency. Unlike standard autocomplete that requires exact prefix matches, Typesense uses trie-based indexing with edit distance (Levenshtein distance) to find results even when users make typos.

Typo Tolerance — Find results with up to 2-3 character errors
Sub-millisecond Latency — In-memory trie traversal
Relevance Ranking — Results ranked by popularity and freshness

The key insight: instead of matching exact prefixes, traverse the trie allowing character substitutions, insertions, and deletions (edit distance).

Requirements

Functional Requirements

Search as you type (every keystroke triggers search)
Typo tolerance (up to 2 edits: substitutions, insertions, deletions)
Results ranked by relevance (popularity, freshness, exact match)
Support for multiple fields (title, description, tags)
Faceted search (filter by category, price range)
Real-time index updates (new documents indexed within 100ms)

Non-Functional Requirements

Latency: Results in < 50ms (p99)
Throughput: 100K search QPS
Freshness: New documents indexed within 100ms
Accuracy: Typo-tolerant results must include correct answers
Scale: 100M documents, 1 billion search terms

Typo tolerance requires exploring multiple paths in the trie simultaneously. For a prefix of length L with edit distance D, the search explores O(26^D × L) paths—still fast for small D.

Back-of-the-Envelope Estimation

Typesense Capacity Estimation

100M documents × 5 fields × 10 words/field = 5 billion words
Trie nodes: ~2 billion (with prefix sharing)
Memory per node: 50 bytes
Total trie memory: 100 GB
Search QPS: 100K with < 50ms latency

Trie with Edit Distance

DfTrie with Typo Tolerance

A trie with typo tolerance extends the standard trie to support fuzzy matching. Instead of following exact character edges, the search explores paths with character substitutions (cost 1), insertions (cost 1), and deletions (cost 1), bounded by a maximum edit distance D.

Levenshtein Distance

d[i][j] = \min \begin{cases} d[i-1][j] + 1 & \text{(delete)} \\ d[i][j-1] + 1 & \text{(insert)} \\ d[i-1][j-1] + cost & \text{(substitute)} \end{cases}

Here,

$d[i][j]$ =Edit distance between first i chars of pattern and first j chars of text
$cost$ =0 if characters match, 1 if they differ

Levenshtein Distance Calculation

"apple" vs "apply": d[0][0]=0, d[0][1]=1, d[0][2]=2, ... d[5][5] = min(d[4][5]+1, d[5][4]+1, d[4][4]+cost) = min(2, 2, 1+0) = 1

Edit distance = 1 (substitute 'e' → 'y')

Trie Traversal with Edit Budget

DfBounded Trie Search

Bounded trie search traverses the trie while tracking remaining edit budget. At each node, the algorithm decides: follow the exact character edge (no cost), substitute (cost 1), skip character in pattern (cost 1), or skip character in trie (cost 1).

For a prefix of length L with max edit distance D, the worst case explores O(3^D × L) paths. With D=2 and L=10: 3^2 × 10 = 90 paths—still sub-millisecond.

Multi-Field Search

Search across multiple fields with field-level boosting:

Multi-Field Score

score = \sum_{f} boost_f \times tfidf(f, query) \times field\_match(f)

Here,

$boost_f$ =Boost weight for field f (e.g., title=2, description=1)
$tfidf(f, query)$ =TF-IDF score for query in field f
$field_match(f)$ =Exact prefix match bonus for field f

Multi-Field Ranking

Document: { title: "iPhone 15 Pro", description: "Apple smartphone" } Query: "iphone"

Title TF-IDF: 3.0 × boost(2.0) = 6.0 Description TF-IDF: 1.5 × boost(1.0) = 1.5 Title prefix match bonus: +5.0

Total score: 6.0 + 1.5 + 5.0 = 12.5

High-Level Architecture

Index Structure

The trie stores document IDs at leaf nodes. Each trie node contains:

Architecture Diagram

TrieNode {
  children: Map<Char, TrieNode>
  doc_ids: Set<DocId>        // Documents with this prefix
  frequency: int             // For ranking
  is_end_of_word: bool
}

Store document metadata (title, price, etc.) separately from the trie. The trie only stores doc_ids and frequency for ranking. Retrieve full documents from the document store after finding matching doc_ids.

Sharding Strategy

Partition the index across shards by document hash:

Search Shard Routing

shards\_to\_query = \text{all\_shards}(typo\_search) \text{ or } hash(key) \mod N_{shards}(exact\_search)

Here,

$typo_search$ =Query all shards for typo tolerance
$exact_search$ =Route to single shard by key hash

For typo-tolerant search, query all shards in parallel (scatter-gather). For exact prefix search, route to a single shard. Most autocomplete queries are prefix searches, so shard routing reduces query fan-out.

Practice Exercises

Algorithm: Implement a bounded trie search that finds all words within edit distance D of a given prefix. What is the time complexity?
Scale: If the index has 100M documents with 10 fields each, estimate the trie memory and the query time for a typo-tolerant search.
Ranking: Design a ranking function that combines typo tolerance, field boosting, popularity, and freshness. How do you weight these signals?
Optimization: How would you implement search-as-you-type that debounces keystrokes and caches recent query results?

Key Takeaways:

Typo tolerance uses bounded trie traversal with Levenshtein distance (edit budget)
For edit distance D=2 and prefix length 10, the search explores ~90 paths (sub-millisecond)
Multi-field search with boosting ranks title matches above description matches
In-memory trie enables sub-millisecond latency; document store is separate on-disk
Scatter-gather across shards for typo tolerance; single-shard routing for exact prefix

What to Learn Next

-> Design Search Autocomplete Standard prefix-based autocomplete with popularity ranking.

-> Design Search Engine Inverted indices and full-text search infrastructure.

-> Database Indexing B-tree, LSM-tree, and trie index structures.

-> Caching Strategies Caching search results and trie hot paths.

-> Databases In-memory vs disk-based storage trade-offs.

-> Design Recommendation System ML-based ranking and personalization.

Design Typesense Autocomplete

Design Typesense Autocomplete

Requirements

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Estimation

Typesense Capacity Estimation

Trie with Edit Distance

DfTrie with Typo Tolerance

Levenshtein Distance

Levenshtein Distance Calculation

Trie Traversal with Edit Budget

DfBounded Trie Search

Multi-Field Search

Multi-Field Score

Multi-Field Ranking

High-Level Architecture

Index Structure

Sharding Strategy

Search Shard Routing

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?