Frontend SLO/SLI and Error Budgets: Operating Quality at Scale

Frontend reliability work starts by naming the terms correctly. A Service Level Indicator measures observed user experience, a Service Level Objective sets the target, a Service Level Agreement is an external promise, and the error budget converts the target into release policy.
Quick Navigation: Definitions First • Frontend Mental Model • Choosing Frontend SLIs • SLI Quality Checklist • Writing Good Frontend SLOs • Core Web Vitals as SLIs • Error Budget Policy • Burn Rate Alerting
Definitions First
| Term | Full form | What it means | Frontend example |
|---|---|---|---|
| SLI | Service Level Indicator | The measured signal. | p75 Largest Contentful Paint, Interaction to Next Paint, JavaScript error-free sessions, checkout success rate. |
| SLO | Service Level Objective | The target for an SLI over a time window. | 95% of product-detail page views have LCP under 2.5 seconds over 28 days. |
| SLA | Service Level Agreement | An external customer-facing commitment, often contractual. | Enterprise contract promises availability or support response with penalties or credits. |
| Error budget | Allowed unreliability | The amount of failure allowed before the SLO is missed. | If the SLO allows 0.5% failed checkout sessions, that 0.5% is the budget. |
Frontend Mental Model
Measure the Experience Users Actually Receive
Backend uptime can be green while the frontend is broken. A user can still fail because JavaScript crashes, chunks fail to load, hydration breaks, a third-party script blocks the main thread, or the page is technically available but unusably slow.
Frontend SLIs should therefore represent browser-visible outcomes:
The staff-level move is translating reliability from infrastructure availability into user journey quality.
Choosing Frontend SLIs
Good SLIs Are Specific and Measurable
Useful frontend Service Level Indicators include:
Avoid vague SLIs like average page speed. Percentiles and route-level segmentation are usually more useful because real user experience is not evenly distributed.
SLI Quality Checklist
A Good SLI Can Be Operated
Before turning a frontend metric into a Service Level Indicator, ask:
A metric that cannot drive a decision is not a good SLI.
Writing Good Frontend SLOs
SLO = Metric + Target + Window + Scope
A Service Level Objective should be precise enough to operate:
95% of product-detail page views on mobile and desktop should have
Largest Contentful Paint under 2.5 seconds over a rolling 28-day window.That sentence has the required parts:
Another frontend example:
99.5% of checkout sessions should complete without a fatal frontend error
or failed payment UI transition over a rolling 30-day window.Good SLOs are strict enough to protect users and loose enough to leave room for product velocity.
Core Web Vitals as SLIs
Web Vitals Fit Naturally Into Frontend SLOs
Core Web Vitals are good frontend SLIs because they map to user-centered outcomes:
A strong frontend SLO does not simply say "make LCP good." It names the route, population, percentile, threshold, and time window.
Example:
At least 75% of landing-page visits should meet all Core Web Vitals thresholds,
segmented by mobile and desktop, over the last 28 days.For business-critical flows, a team may define stricter internal targets than public search or tooling thresholds.
Error Budget Policy
Budgets Turn Metrics Into Decisions
If an SLO allows 0.5% failure over a window, that 0.5% is the error budget. The budget says how much unreliability the team can spend before changing behavior.
Example policy:
The point is not punishment. The point is aligning product speed with user trust.
Burn Rate Alerting
Alert on Consumption Speed
Burn rate asks how quickly the system is consuming its error budget. A short, severe incident and a slow reliability leak need different alerting windows.
Useful alerting combines:
Alert fatigue is a design failure. Alerts should represent user-impacting budget burn, not every noisy metric spike.
Segmentation
Aggregates Hide Broken Experiences
Frontend reliability must be sliced by:
A global SLO can be green while mobile checkout is broken. Senior engineers inspect the distribution, not only the headline number.
Instrumentation Risks
The Metric Can Lie
Common pitfalls:
If the instrumentation is wrong, the budget is fiction. Validate the measurement path before using it for release governance.
Interview Framing
Senior Answer Pattern
I would define Service Level Indicators around critical frontend user journeys, set Service Level Objectives with a target and time window, track error budget burn, segment by route/device/release, and tie release policy to budget health.
For frontend, I would include JavaScript error-free sessions, successful page loads, Core Web Vitals, interaction latency, chunk load success, and critical flow completion. I would not rely only on backend uptime, because users experience the browser, not just the server.