How Frontend Developers Can Handle Millions of API Requests Without Crashing Everything
At scale, frontend traffic issues rarely come from raw user count alone. They emerge from duplicated triggers, uncontrolled retries, cache misses, and failure amplification. The goal is not to blindly reduce requests — it is to shape demand, preserve correctness, and protect the backend during degradation.
Quick Decision Guide
Interview answer spine: de-dupe -> cache -> limit concurrency -> retry safely -> shed load -> observe.
If you can explain those layers with trade-offs and failure scenarios, your design will sound production-ready.
Requirements & Mental Model
Why 'millions of requests' happens
A single user rarely causes overload. The real problem appears when:
This creates failure amplification — traffic multiplies exactly when the backend is weakest.
What interviewers want
Deduplication & Cancellation
In-flight deduplication
If 5 components request the same data simultaneously, send one request.
const inFlight = new Map<string, Promise<unknown>>()
export async function fetchOnce(key: string, fn: () => Promise<unknown>) {
if (inFlight.has(key)) return inFlight.get(key)!
const p = fn().finally(() => inFlight.delete(key))
inFlight.set(key, p)
return p
}Cancel outdated work
For search, filters, or rapid route changes:
let controller: AbortController | null = null
export async function search(q: string) {
controller?.abort()
controller = new AbortController()
const res = await fetch(`/api/search?q=${encodeURIComponent(q)}`, { signal: controller.signal })
return res.json()
}Cancellation prevents stale responses from overriding current intent.
Caching & Revalidation
Cache layers
Concrete HTTP example
Cache-Control: public, max-age=60, s-maxage=120, stale-while-revalidate=300
ETag: "abc123"
Clients send If-None-Match for efficient 304 responses.
SWR UX pattern
1. Render cached data instantly
2. Revalidate in background
3. Update only if changed
This reduces backend load while preserving perceived speed.
Batching, Pagination & Backpressure
Shape request volume
Backpressure vs Rate Limiting
Backpressure prevents request pileups before they hit the backend.
Concurrency limiter
export function createLimiter(limit: number) {
let active = 0
const q: Array<() => void> = []
const next = () => {
if (active >= limit) return
const job = q.shift()
if (!job) return
active++
job()
}
return function run<T>(task: () => Promise<T>): Promise<T> {
return new Promise((resolve, reject) => {
q.push(() => {
task().then(resolve, reject).finally(() => {
active--
next()
})
})
next()
})
}
}Reliability & Failure Control
Timeouts everywhere
Never allow unbounded hangs. Always pair fetch with timeout + cancellation.
Retry safely
Circuit breaker
If repeated failures occur:
This prevents retry storms.
Load Shedding & Graceful Degradation
When backend health degrades, reduce optional load:
Load shedding protects core user flows instead of failing everything.
Realtime, Edge & Observability
Realtime strategy
Prefer WebSocket/SSE for frequent updates.
If polling:
CDN
Use s-maxage and stale-while-revalidate for shared responses.
Separate personalized data via BFF.
Metrics to track
If it is not instrumented, it is not scalable.