πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Design Typesense Autocomplete

System Design ProblemsTypo-Tolerant Search🟒 Free Lesson

Advertisement

System Design Problems

Design Typesense Autocomplete

Typesense provides typo-tolerant, instant search with sub-millisecond latency. Unlike standard autocomplete that requires exact prefix matches, Typesense uses trie-based indexing with edit distance (Levenshtein distance) to find results even when users make typos.

  • Typo Tolerance β€” Find results with up to 2-3 character errors
  • Sub-millisecond Latency β€” In-memory trie traversal
  • Relevance Ranking β€” Results ranked by popularity and freshness

The key insight: instead of matching exact prefixes, traverse the trie allowing character substitutions, insertions, and deletions (edit distance).

Requirements

Functional Requirements

  • Search as you type (every keystroke triggers search)
  • Typo tolerance (up to 2 edits: substitutions, insertions, deletions)
  • Results ranked by relevance (popularity, freshness, exact match)
  • Support for multiple fields (title, description, tags)
  • Faceted search (filter by category, price range)
  • Real-time index updates (new documents indexed within 100ms)

Non-Functional Requirements

  • Latency: Results in < 50ms (p99)
  • Throughput: 100K search QPS
  • Freshness: New documents indexed within 100ms
  • Accuracy: Typo-tolerant results must include correct answers
  • Scale: 100M documents, 1 billion search terms

Typo tolerance requires exploring multiple paths in the trie simultaneously. For a prefix of length L with edit distance D, the search explores O(26^D Γ— L) pathsβ€”still fast for small D.

Back-of-the-Envelope Estimation

Typesense Capacity Estimation

  • 100M documents Γ— 5 fields Γ— 10 words/field = 5 billion words
  • Trie nodes: ~2 billion (with prefix sharing)
  • Memory per node: 50 bytes
  • Total trie memory: 100 GB
  • Search QPS: 100K with < 50ms latency

Trie with Edit Distance

DfTrie with Typo Tolerance

A trie with typo tolerance extends the standard trie to support fuzzy matching. Instead of following exact character edges, the search explores paths with character substitutions (cost 1), insertions (cost 1), and deletions (cost 1), bounded by a maximum edit distance D.

rootabpip*z*Edit Distance 1 from "app":"app" β†’ "app" (match) βœ“"app" β†’ "apz" (substitute pβ†’z)Trie with typo-tolerant traversal

Levenshtein Distance

d[i][j]=min⁑{d[iβˆ’1][j]+1(delete)d[i][jβˆ’1]+1(insert)d[iβˆ’1][jβˆ’1]+cost(substitute)d[i][j] = \min \begin{cases} d[i-1][j] + 1 & \text{(delete)} \\ d[i][j-1] + 1 & \text{(insert)} \\ d[i-1][j-1] + cost & \text{(substitute)} \end{cases}

Here,

  • d[i][j]d[i][j]=Edit distance between first i chars of pattern and first j chars of text
  • costcost=0 if characters match, 1 if they differ

Levenshtein Distance Calculation

"apple" vs "apply": d[0][0]=0, d[0][1]=1, d[0][2]=2, ... d[5][5] = min(d[4][5]+1, d[5][4]+1, d[4][4]+cost) = min(2, 2, 1+0) = 1

Edit distance = 1 (substitute 'e' β†’ 'y')

Trie Traversal with Edit Budget

DfBounded Trie Search

Bounded trie search traverses the trie while tracking remaining edit budget. At each node, the algorithm decides: follow the exact character edge (no cost), substitute (cost 1), skip character in pattern (cost 1), or skip character in trie (cost 1).

Exact match (cost 0)Substitute (cost 1)Insert/Delete (cost 1)Budget=2: "apxle" β†’ "apple" (1 substitute) βœ“Budget=2: "aple" β†’ "apple" (1 insert) βœ“Budget=2: "banana" β†’ "apple" (edit=4) βœ—Bounded search explores paths within edit budget

For a prefix of length L with max edit distance D, the worst case explores O(3^D Γ— L) paths. With D=2 and L=10: 3^2 Γ— 10 = 90 pathsβ€”still sub-millisecond.

Multi-Field Search

Search across multiple fields with field-level boosting:

Multi-Field Score

score=βˆ‘fboostfΓ—tfidf(f,query)Γ—field_match(f)score = \sum_{f} boost_f \times tfidf(f, query) \times field\_match(f)

Here,

  • boostfboost_f=Boost weight for field f (e.g., title=2, description=1)
  • tfidf(f,query)tfidf(f, query)=TF-IDF score for query in field f
  • fieldmatch(f)field_match(f)=Exact prefix match bonus for field f

Multi-Field Ranking

Document: { title: "iPhone 15 Pro", description: "Apple smartphone" } Query: "iphone"

Title TF-IDF: 3.0 Γ— boost(2.0) = 6.0 Description TF-IDF: 1.5 Γ— boost(1.0) = 1.5 Title prefix match bonus: +5.0

Total score: 6.0 + 1.5 + 5.0 = 12.5

High-Level Architecture

ClientCDNSearch ServiceQuery ParserTrie RouterFuzzy Search EngineResult RankerSnippet GeneratorIn-Memory Trie(Per-shard)Document Store(On-disk)Typesense Autocomplete Architecture

Index Structure

The trie stores document IDs at leaf nodes. Each trie node contains:

Architecture Diagram
TrieNode {
  children: Map<Char, TrieNode>
  doc_ids: Set<DocId>        // Documents with this prefix
  frequency: int             // For ranking
  is_end_of_word: bool
}

Store document metadata (title, price, etc.) separately from the trie. The trie only stores doc_ids and frequency for ranking. Retrieve full documents from the document store after finding matching doc_ids.

Sharding Strategy

Partition the index across shards by document hash:

Search Shard Routing

shards_to_query=all_shards(typo_search)Β orΒ hash(key)mod  Nshards(exact_search)shards\_to\_query = \text{all\_shards}(typo\_search) \text{ or } hash(key) \mod N_{shards}(exact\_search)

Here,

  • typosearchtypo_search=Query all shards for typo tolerance
  • exactsearchexact_search=Route to single shard by key hash

For typo-tolerant search, query all shards in parallel (scatter-gather). For exact prefix search, route to a single shard. Most autocomplete queries are prefix searches, so shard routing reduces query fan-out.

Practice Exercises

  1. Algorithm: Implement a bounded trie search that finds all words within edit distance D of a given prefix. What is the time complexity?

  2. Scale: If the index has 100M documents with 10 fields each, estimate the trie memory and the query time for a typo-tolerant search.

  3. Ranking: Design a ranking function that combines typo tolerance, field boosting, popularity, and freshness. How do you weight these signals?

  4. Optimization: How would you implement search-as-you-type that debounces keystrokes and caches recent query results?

Key Takeaways:

  • Typo tolerance uses bounded trie traversal with Levenshtein distance (edit budget)
  • For edit distance D=2 and prefix length 10, the search explores ~90 paths (sub-millisecond)
  • Multi-field search with boosting ranks title matches above description matches
  • In-memory trie enables sub-millisecond latency; document store is separate on-disk
  • Scatter-gather across shards for typo tolerance; single-shard routing for exact prefix

What to Learn Next

-> Design Search Autocomplete Standard prefix-based autocomplete with popularity ranking.

-> Design Search Engine Inverted indices and full-text search infrastructure.

-> Database Indexing B-tree, LSM-tree, and trie index structures.

-> Caching Strategies Caching search results and trie hot paths.

-> Databases In-memory vs disk-based storage trade-offs.

-> Design Recommendation System ML-based ranking and personalization.

⭐

Premium Content

Design Typesense Autocomplete

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement