. Design a Search System for Knowledge Base - Frontend System Design Interview Guide
Design a production-ready search system for a knowledge base (like Confluence, Notion, or Zendesk Help Center) with intelligent indexing, relevance ranking, and fast autocomplete.
Backend as Black Box: Assume you have a search API. Focus on the frontend architecture including client-side search capabilities.
Key Challenges
This problem goes beyond basic autocomplete to explore:
- Client-Side Indexing: How to build a search index in the browser
- Relevance Scoring: TF-IDF, BM25, and semantic similarity
- Faceted Search: Filter by category, author, date ranges
- Search-as-you-type: Sub-100ms autocomplete with highlighting
- Offline Search: Search cached content without network
When users search a knowledge base, they expect instant suggestions, relevant results, and fast filtering—even with 100,000+ documents. This solution designs a search system that handles intelligent indexing, relevance ranking, and faceted search while balancing client-side speed with server-side completeness. The key insight: a good search system uses hybrid architecture—server-side relevance with client-side caching and refinement.
HLD interview focus: Requirements, architecture, tradeoffs, data flow, and scaling decisions. Any implementation snippets shown are optional unless explicitly asked.
I'll start by defining what makes a great search experience—what questions does a user need answered as they search? Then I'll design the architecture that handles intelligent indexing, relevance ranking, and faceted search. Finally, I'll design clear boundaries between server-side relevance and client-side caching.
Why this approach?
Most candidates build a "search input with results." Strong candidates build a "search system"—a hybrid architecture that combines server-side relevance with client-side speed, handles faceted filtering, and provides instant suggestions. The difference is thinking about the entire search journey, not just showing results.
Think of this like building a search engine for documentation. You don't just show results—you handle relevance ranking, faceted filtering, fuzzy matching, and offline search. Same principles apply here.
Before designing anything, let's define what success looks like. When users search a knowledge base, they need instant suggestions, relevant results, and fast filtering.
Requirements Exploration Questions
DiscoveryWhat content types should be searchable?
- Articles, documentation pages
- Comments and discussions
- File attachments (PDF, Word)
- Code snippets
What search capabilities?
- Full-text search
- Autocomplete/typeahead
- Fuzzy matching for typos
- Faceted filtering (category, author, date)
- Saved searches
What are the latency requirements?
- Autocomplete: < 100ms (feels instant)
- Full results: < 500ms
- Filter updates: < 200ms
Offline requirements?
- Search recently viewed content offline
- Sync when back online
Functional Requirements
Must HaveMVP (Core Features - What I'd Design First):
- Full-text search across all documents
- Search-as-you-type with instant suggestions (< 100ms)
- Fuzzy matching for typos (1-2 character tolerance)
- Faceted filtering (category, author, date)
- Relevance-ranked results with highlighting
- Clear loading/empty/error states
- Basic accessibility and keyboard support
Advanced Features (Add If Time Permits):
- Boolean operators (AND, OR, NOT) and phrase search
- Client-side indexing for offline search
- Saved searches and search history
- Multi-language support
Non-Functional Requirements
Quality BarPerformance:
- Autocomplete: < 100ms (P95)
- Full search: < 500ms (P95)
- No UI jank during search
- Handle 100K+ documents
Scalability:
- Support 10K+ concurrent users
- Index updates within 5 minutes
- Handle 1000+ queries/second
Reliability:
- Graceful degradation on API failure
- Offline search for cached content
- No data loss
Accessibility:
- Keyboard navigation
- Screen reader support
- WCAG 2.1 AA compliance
Security & Compliance:
- Strict authn/authz checks for scoped content access
- Query/input sanitization and XSS-safe result rendering
- Rate limiting and abuse protection for search endpoints
Observability:
- Track p95 latency, error rate, and retry rate
- Log critical client/server sync failures
- Alert on sustained degradation and queue backlog growth
Hybrid Search Architecture
Why Hybrid?
- Client-side: Instant results for recent/cached content
- Server-side: Complete results across full corpus
┌─────────────────────────────────────────────────────────────────────────┐
│ Client │
│ ┌─────────────────┐ ┌──────────────────┐ ┌────────────────────┐ │
│ │ Search Input │───►│ Search Manager │◄──►│ Results UI │ │
│ │ (Debounced) │ │ (Orchestrator) │ │ (Ranked List) │ │
│ └─────────────────┘ └────────┬─────────┘ └────────────────────┘ │
│ │ │
│ ┌──────────────────────┼──────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────────┐ │
│ │ Local Index │ │ Server API │ │ Facet Engine │ │
│ │ (Web Worker) │ │ (REST/GraphQL) │ │ (Aggregations) │ │
│ │ │ │ │ │ │ │
│ │ • Recent docs │ │ • Full corpus │ │ • Category counts │ │
│ │ • Viewed docs │ │ • Fuzzy match │ │ • Author counts │ │
│ │ • < 50ms │ │ • < 500ms │ │ • Date histogram │ │
│ └────────────────┘ └────────────────┘ └────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ IndexedDB │ │
│ │ • Persisted search index (offline support) │ │
│ │ • Recently viewed documents │ │
│ │ • Search history │ │
│ └────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Server (Black Box) │
│ • Elasticsearch / Algolia / MeiliSearch │
│ • Full document corpus │
│ • Advanced relevance tuning │
└─────────────────────────────────────────────────────────────────────────┘Progressive Results Strategy
User types query
│
▼
┌─────────────────────────────────┐
│ 1. Search local index │ ← Instant (< 50ms)
│ Show results immediately │
└─────────────────────────────────┘
│
▼ (parallel)
┌─────────────────────────────────┐
│ 2. Query server API │ ← Background (< 500ms)
│ (debounced 200ms) │
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ 3. Merge & re-rank results │ ← Seamless update
│ Dedupe, boost local matches │
└─────────────────────────────────┘Why This Works:
- User sees instant feedback
- Results improve as server responds
- Works offline with local index only
Why Web Workers for Indexing?
Problem: Building search index on main thread = UI freeze
Solution: Offload to Web Worker
Main Thread Web Worker
│ │
│──── Documents to index ─────►│
│ (don't block UI) │
│ │ ← Build index in background
│ │
│◄──── Index ready ────────────│
│ │
│──── Search query ───────────►│
│ │ ← Search in parallel
│◄──── Results ────────────────│Benefits:
- UI stays responsive during indexing
- Can index thousands of documents without jank
- Parallel search doesn't block typing
Tradeoffs & Comparisons
- CSR vs SSR/ISR: Rendering Strategies
Entity and interface contract shape with cache/reconciliation model for a frontend system design interview (backend treated as a black box).
1) Component prop interfaces (boundaries)
Define boundaries between query input, facet controls, and result surfaces.
SearchShellProps: route-level search context + navigation callbacksSearchInputProps: controlled query state + keyboard handlingFacetPanelProps: selected filters and toggle handlersResultListProps: ranked ids + pagination/selection callbacks
2) Hook interfaces (consumption contracts)
Hook contracts keep data-consumption details explicit while hiding fetch internals.
Document Structure
interface Document {
id: string;
title: string;
content: string; // Full text content
snippet: string; // Preview snippet (200 chars)
url: string;
// Metadata for facets
category: string;
author: {
id: string;
name: string;
};
space: string; // Workspace/project
tags: string[];
// Temporal
createdAt: Date;
updatedAt: Date;
viewedAt?: Date; // For recency boost
}Inverted Index Structure
What is an Inverted Index?
Instead of: Document → Words (forward index)
Use: Word → Documents (inverted index)
Forward Index (slow):
Doc1: ["react", "hooks", "tutorial"]
Doc2: ["react", "state", "management"]
To find "react": Scan ALL documents ❌
Inverted Index (fast):
"react": [Doc1, Doc2]
"hooks": [Doc1]
"tutorial": [Doc1]
"state": [Doc2]
To find "react": O(1) lookup ✓Index Structure:
interface SearchIndex {
// Term → Document IDs with positions
invertedIndex: Map<string, PostingList>;
// Document metadata for ranking
documents: Map<string, DocumentMeta>;
// Pre-computed stats for scoring
avgDocLength: number;
totalDocs: number;
}
interface PostingList {
docIds: string[];
termFrequency: Map<string, number>; // docId → count
positions: Map<string, number[]>; // docId → positions
}Relevance Scoring: TF-IDF vs BM25
TF-IDF (Term Frequency - Inverse Document Frequency):
Score = TF × IDF
TF = (times term appears in doc) / (total terms in doc)
IDF = log(total docs / docs containing term)- Common words (the, a, is) → low IDF
- Rare words (kubernetes, authentication) → high IDF
BM25 (Better Matching 25):
Improved version that handles document length better:
- Short docs with term → higher score
- Long docs with term → lower score (term may be incidental)
When to use which:
- TF-IDF: Simple, good for short documents
- BM25: Better for varied document lengths (articles, docs)
Search Query AST
Parse queries into Abstract Syntax Tree:
Query: react hooks category:tutorials -deprecated
{
type: "AND",
children: [
{ type: "TERM", value: "react" },
{ type: "TERM", value: "hooks" },
{ type: "FILTER", field: "category", value: "tutorials" },
{ type: "NOT", child: { type: "TERM", value: "deprecated" } }
]
}Supported syntax:
term- full-text search"phrase"- exact phrase matchfield:value- field filter-term- exclude termOR- any match
Client cache shape (recommended)
entitiesById: Record<ID, Entity>orderedIds: ID[]for rendering orderpageInfo/cursor metadata for pagination or range loading
Deep dive: Data Normalization
Consistency & reconciliation rules
- Make writes idempotent where retries are possible.
- Apply realtime updates with version/event ordering checks.
- Prefer server-authoritative reconciliation after optimistic mutations.
Tradeoffs & Comparisons
- Normalized vs Denormalized: Data Normalization
Component Boundaries
StructuredSearchShell
Coordinates search state, URL sync, and panel-level composition.
SearchInput
Owns query editing, keyboard controls, and suggestion open state.
FacetPanel
Manages selected filters and emits filter toggle intents.
ResultList
Renders ranked results and emits pagination/selection actions.
export interface SearchShellProps {
initialQuery?: string;
onOpenDocument: (docId: string) => void;
}
export interface SearchInputProps {
query: string;
setQuery: (query: string) => void;
onSubmit: () => void;
onKeyDown: (event: React.KeyboardEvent<HTMLInputElement>) => void;
}
export interface FacetPanelProps {
selectedFilters: Record<string, string[]>;
onToggleFilter: (facet: string, value: string) => void;
onClearAll: () => void;
}
export interface ResultListProps {
resultIds: string[];
resultsById: Record<string, SearchResult>;
hasMore: boolean;
onLoadMore: () => void;
}export interface UseSearchQueryResult {
resultIds: string[];
resultsById: Record<string, SearchResult>;
facets: FacetBucket[];
isLoading: boolean;
hasMore: boolean;
loadMore: () => void;
}
export interface UseSearchInputResult {
query: string;
setQuery: (value: string) => void;
activeSuggestionIndex: number;
setActiveSuggestionIndex: (index: number) => void;
}
export function useSearchQuery(_input: SearchQueryInput): UseSearchQueryResult {
throw new Error('Contract-only snippet');
}
export function useSearchInput(): UseSearchInputResult {
throw new Error('Contract-only snippet');
}React interfaces & integration patterns (props, hooks, callbacks).
This section covers API contracts and React consumption patterns.
API contracts (Backend as black box)
Search API:
GET /api/search?q={query}&page={page}&limit={limit}&facets={facets}
Request Parameters:
- q: Search query string (required)
- page: Page number (default: 1)
- limit: Results per page (default: 20)
- facets: Comma-separated facet filters (e.g., "category:tutorial,author:john")
Response:
{
results: SearchResult[];
total: number;
page: number;
hasMore: boolean;
facets: {
category: FacetValue[];
author: FacetValue[];
dateRange: FacetValue[];
tags: FacetValue[];
};
query: string; // Echo of search query
suggestions?: string[]; // Search suggestions
}
interface SearchResult {
id: string;
title: string;
snippet: string; // Highlighted snippet with <mark> tags
url: string;
category: string;
author: string;
publishedAt: string;
tags: string[];
relevanceScore: number;
}
interface FacetValue {
value: string;
count: number;
selected?: boolean;
}Autocomplete/Suggestions API:
GET /api/search/suggestions?q={query}&limit={limit}
Response:
{
suggestions: string[];
recentSearches?: string[];
popularSearches?: string[];
}Document Detail API:
GET /api/documents/:documentId
Response:
{
id: string;
title: string;
content: string; // Full content or HTML
author: string;
category: string;
publishedAt: string;
updatedAt: string;
tags: string[];
relatedDocuments?: string[]; // Document IDs
}Search Analytics API:
POST /api/search/analytics
{
query: string;
resultClicked?: string; // Document ID if clicked
timeSpent?: number; // milliseconds
}
Response: { success: boolean; }Type definitions used in contracts
interface SearchResult {
id: string;
title: string;
snippet: string;
url: string;
category: string;
author: string;
publishedAt: string;
}
interface SearchSuggestion {
value: string;
source: 'recent' | 'popular' | 'query';
}
interface SearchAnalyticsPayload {
query: string;
resultClicked?: string;
timeSpent?: number;
}3) Integration patterns (React wiring)
- Debounced intent pipeline: input changes produce request intent after debounce.
- Stale response protection: guard state updates by query/request token.
- Facet + result coherence: keep selected filters and result cache in sync.
- Keyboard-first navigation: preserve focus and ARIA semantics across rerenders.
Integration Patterns
StructuredDebounced querying
Convert rapid input into stable request intents.
Token-based guards
Ignore stale responses that no longer match active query state.
Facet coherence
Keep facet selection and result cache snapshots aligned.
A11y-first focus flow
Maintain predictable keyboard and screen-reader behavior.
Search Performance Optimizations
1. Debouncing Search Input
Problem: API call on every keystroke = wasted requests
Solution: Wait for typing pause
User types: r-e-a-c-t (5 keystrokes)
Without debounce: 5 API calls ❌
With debounce (200ms): 1 API call ✓Debounce timing:
- Autocomplete: 150-200ms (fast feedback)
- Full search: 300-400ms (more deliberate)
2. Client-Side Index for Instant Results
Index recently viewed documents locally:
Document viewed → Add to local index
│
User searches ────────►│
│
├── Check local index (< 50ms)
│ │
│ ▼
│ Show instant results
│
└── Query server (parallel)
│
▼
Merge & updateWhat to index locally:
- Recently viewed docs (last 100)
- Frequently accessed docs
- User's own content
- Search history
3. Fuzzy Matching for Typos
Problem: "deploment" doesn't match "deployment"
Solution: Levenshtein distance tolerance
Query: "deploment"
│
▼
Calculate edit distance to all terms
│
▼
"deployment" - distance 1 ✓ (within threshold)
"development" - distance 3 ✗ (too far)Implementation options:
- Trigram index: "dep" → "deployment", "development"
- N-gram tokenization: "dep", "epl", "plo", "loy", ...
- Levenshtein automaton for efficient matching
4. Highlighting Matches
Show users WHY a result matched:
Query: "react hooks"
Before: "Learn about React Hooks and state management"
After: "Learn about <mark>React</mark> <mark>Hooks</mark> and state management"Implementation:
- Tokenize query into terms
- Find term positions in result
- Wrap matches with highlight tags
- Extend context around matches (50 chars each side)
5. Facet Count Optimization
Problem: Counting facets for 100K docs = slow
Solution: Pre-compute + approximate counts
Strategy 1: Pre-aggregated counts (cached)
{ "tutorials": 1234, "guides": 567, ... }
Strategy 2: Approximate for large sets
If results > 10K, show "1K+" instead of exact count
Strategy 3: Lazy load counts
Show results first, load counts asyncPerformance Targets
| Metric | Target | Technique |
|---|---|---|
| Autocomplete | < 100ms | Local index + debounce |
| Full search | < 500ms | Server-side (Elasticsearch) |
| Fuzzy match | < 50ms | Trigram index |
| Facet counts | < 200ms | Pre-aggregation |
| Index update | < 5s | Web Worker background |
Why Local Index + Server Search?
┌─────────────────────────────┐
│ Query Complexity │
│ │
│ Simple ───────► Complex │
│ │
│ Local Server │
│ Index Search │
│ │
│ • Fast • Complete │
│ • Limited • Slow │
│ • Offline • Online only │
└─────────────────────────────┘
Optimal: Use both!
- Local for instant feedback
- Server for comprehensive results
- Merge for best of both worldsWhy This Design Works
StructuredHybrid Search Architecture
- Local index: Instant results for cached content
- Server search: Complete results across full corpus
- Progressive loading: Fast feedback → refined results
Inverted Index
- O(1) term lookup: Don't scan all documents
- Position tracking: Enable phrase search and highlighting
- Term frequency: Enable relevance scoring
Web Workers
- Non-blocking: UI stays responsive during indexing
- Parallel search: Main thread handles typing, worker handles search
- Large index support: Can index thousands of documents
Fuzzy Matching
- Typo tolerance: "deploment" matches "deployment"
- N-gram index: Fast fuzzy lookup
- Configurable distance: 1-2 characters for short words
BM25 Scoring
- Document length normalization: Fair ranking across varied lengths
- Term frequency saturation: Diminishing returns for repeated terms
- Industry standard: Used by Elasticsearch, Lucene
Key Takeaways
- Use hybrid search - local for speed, server for completeness
- Build inverted index - O(1) term lookup vs O(n) scan
- Offload to Web Workers - keep UI responsive during indexing
- Debounce input - 200-300ms to reduce API calls
- Implement fuzzy matching - users make typos
- Highlight matches - show users why results matched
- Progressive results - show local first, refine with server
- Persist index - IndexedDB for offline search
Key Takeaways
- ✓Use hybrid search: local index for speed, server for completeness
- ✓Build inverted index for O(1) term lookup
- ✓Offload indexing to Web Workers to keep UI responsive
- ✓Debounce search input (200-300ms) to reduce API calls
- ✓Implement fuzzy matching using trigram or n-gram indexes
- ✓Highlight matching terms in results for better UX
- ✓Use BM25 for relevance scoring (handles document length better than TF-IDF)
- ✓Persist search index in IndexedDB for offline capability