
Error Handling & Resilience
Resilient frontends classify failures, contain blast radius, preserve user intent, and make recovery observable.
Quick Navigation: Model | Boundaries | Retry | Async | Recovery UX | Interview
Failure Domains
Error handling is not a catch block strategy. It is a product reliability strategy. A render crash, a failed query, a failed mutation, and a broken third-party dependency need different containment and recovery behavior.
| Failure | Examples | Response |
|---|---|---|
| Render failure | Unexpected component crash, invalid assumptions, unsafe null access. | Contain with error boundaries, show scoped fallback, report with component and route context. |
| Network failure | Timeout, offline user, DNS/CDN issue, API unavailable. | Classify retryability, preserve stale data when safe, expose retry or alternate path. |
| Mutation failure | Save fails, payment attempt fails, optimistic update rejected. | Rollback or reconcile state, keep user intent visible, avoid duplicate side effects. |
| Dependency failure | Third-party script, analytics, feature flag provider, auth provider. | Degrade non-critical features, fail closed for security, fail open only when risk is acceptable. |
Error Boundaries and Containment
React error boundaries catch render-phase errors in their child tree and replace that subtree with fallback UI. They do not catch every kind of failure: event handlers, async callbacks, server rendering, and errors inside the boundary itself need separate handling.
Placement is the architectural decision. App-level boundaries prevent a blank screen. Route-level boundaries isolate pages. Feature-level boundaries keep a broken widget from taking down checkout, search, or navigation.
Retry Policy
Retry is safe only when the operation is retryable. Blind retries can overload a failing service or duplicate irreversible user actions.
Safe reads
Retry transient failures with capped exponential backoff and jitter.
User-triggered writes
Prefer explicit retry; use idempotency keys when duplicate submission is possible.
Payment or irreversible action
Never blind-retry without server guarantees and clear user feedback.
Persistent validation failure
Do not retry; explain the correction needed.
Async Failure Policy
Error boundaries do not replace async error handling. Fetches, event handlers, dynamic imports, and background work need explicit policies for timeout, cancellation, idempotency, and user feedback.
| Failure | Policy |
|---|---|
| Timeout | Abort the request, show retry or stale data, and report duration plus endpoint context. |
| Offline | Use cached data or offline messaging; avoid pretending the action succeeded. |
| Duplicate submit | Disable or debounce the action and rely on idempotency keys for server safety. |
| Chunk load failure | Offer a reload path and correlate with release, cache state, and asset URL. |
Staff-level answers call out idempotency. Retrying a search request and retrying a payment mutation are not the same risk.
Recovery UX
Good fallback UI preserves user agency. It tells users what is affected, what they can do next, and whether their previous action was saved, cancelled, pending, or needs retry.
- For read failures, prefer stale-but-labeled data when freshness is not critical.
- For write failures, preserve the draft or intent so the user can retry safely.
- For partial outages, disable only the affected feature and keep the rest usable.
- For security or payment uncertainty, be explicit and avoid optimistic claims.
Observability Contract
A frontend error report should be actionable. Capture route, release, component or feature, browser, device class, user action, feature flags, and whether the user recovered. Without that context, error tracking becomes a noisy inbox.
Resilience Smells
- Every error is displayed as the same generic message.
- Retried mutations can create duplicate orders, payments, or records.
- Error boundaries exist only at the app root, so one widget failure blanks the whole page.
- Fallback UI hides user intent instead of helping the user recover.
- Telemetry captures an error message but not route, release, browser, feature flag, or user action context.
Interview Framing
Senior answer pattern
I would classify the failure, contain it at the smallest useful boundary, decide whether retry is safe, preserve user intent, and report enough context to debug by route, release, feature, and device segment. The goal is not hiding errors; it is limiting user impact and making the failure diagnosable.