How Frontend Developers Can Handle Millions of API Requests Without Crashing Everything

Medium•

At scale, frontend traffic issues rarely come from raw user count alone. They emerge from duplicated triggers, uncontrolled retries, cache misses, and failure amplification. The goal is not to blindly reduce requests — it is to shape demand, preserve correctness, and protect the backend during degradation.

Quick Decision Guide

Interview answer spine: de-dupe -> cache -> limit concurrency -> retry safely -> shed load -> observe.

If you can explain those layers with trade-offs and failure scenarios, your design will sound production-ready.

Requirements & Mental Model

Why 'millions of requests' happens

A single user rarely causes overload. The real problem appears when:

•High DAU hits identical route patterns
•Each route triggers multiple API calls
•Prefetch, re-renders, polling, and retries duplicate work
•Failures synchronize retries across clients

This creates failure amplification — traffic multiplies exactly when the backend is weakest.

What interviewers want

•Load shaping before load reduction
•Consistency guarantees (no stale overwrite)
•Resilience under partial outages
•Clear SLO thinking (e.g., 99% of requests under 300ms)

Deduplication & Cancellation

In-flight deduplication

If 5 components request the same data simultaneously, send one request.

const inFlight = new Map<string, Promise<unknown>>()

export async function fetchOnce(key: string, fn: () => Promise<unknown>) {
  if (inFlight.has(key)) return inFlight.get(key)!
  const p = fn().finally(() => inFlight.delete(key))
  inFlight.set(key, p)
  return p
}

Cancel outdated work

For search, filters, or rapid route changes:

let controller: AbortController | null = null

export async function search(q: string) {
  controller?.abort()
  controller = new AbortController()
  const res = await fetch(`/api/search?q=${encodeURIComponent(q)}`, { signal: controller.signal })
  return res.json()
}

Cancellation prevents stale responses from overriding current intent.

Caching & Revalidation

Cache layers

•In-memory per tab
•IndexedDB/localStorage (optional)
•Service Worker (optional)
•HTTP + CDN (highest leverage)

Concrete HTTP example

Cache-Control: public, max-age=60, s-maxage=120, stale-while-revalidate=300

ETag: "abc123"

Clients send If-None-Match for efficient 304 responses.

SWR UX pattern

1. Render cached data instantly

2. Revalidate in background

3. Update only if changed

This reduces backend load while preserving perceived speed.

Batching, Pagination & Backpressure

Shape request volume

•Cursor-based pagination over bulk fetch
•Batch fragmented calls via aggregation endpoint
•Debounce user-triggered calls

Backpressure vs Rate Limiting

•Rate limiting: server-enforced protection
•Backpressure: client-side concurrency control

Backpressure prevents request pileups before they hit the backend.

Concurrency limiter

export function createLimiter(limit: number) {
  let active = 0
  const q: Array<() => void> = []

  const next = () => {
    if (active >= limit) return
    const job = q.shift()
    if (!job) return
    active++
    job()
  }

  return function run<T>(task: () => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      q.push(() => {
        task().then(resolve, reject).finally(() => {
          active--
          next()
        })
      })
      next()
    })
  }
}

Reliability & Failure Control

Timeouts everywhere

Never allow unbounded hangs. Always pair fetch with timeout + cancellation.

Retry safely

•Retry GET freely with exponential backoff + jitter
•Avoid blind retries for POST unless idempotency keys exist
•Cap retry attempts

Circuit breaker

If repeated failures occur:

•Open circuit temporarily
•Serve stale or fallback UI
•Probe health after cooldown

This prevents retry storms.

Load Shedding & Graceful Degradation

When backend health degrades, reduce optional load:

•Disable analytics calls
•Pause polling
•Defer non-critical prefetch
•Serve stale cached data

Load shedding protects core user flows instead of failing everything.

Realtime, Edge & Observability

Realtime strategy

Prefer WebSocket/SSE for frequent updates.

If polling:

•Increase interval on failure
•Slow down in background tabs
•Request deltas instead of full payloads

CDN

Use s-maxage and stale-while-revalidate for shared responses.

Separate personalized data via BFF.

Metrics to track

•Request rate per route
•Error rate by class
•p50/p95/p99 latency
•Cache hit ratio (client + CDN)
•Retry counts
•Circuit-open events

If it is not instrumented, it is not scalable.

Key Takeaways

1Scale failures come from duplication and retry amplification.
2Deduplication and cancellation are correctness tools.
3HTTP caching with stale-while-revalidate is extremely high ROI.
4Backpressure prevents traffic spikes from reaching the backend.
5Retries must respect idempotency and use backoff + jitter.
6Load shedding protects critical flows during incidents.
7Observability defines scalability.