Observability for Frontend: RUM, Errors, Performance Marks, Actionable Dashboards

Medium•

Frontend observability provides insight into how real users experience an application in production. It combines metrics, logs, and traces to monitor performance, stability, and reliability.

A strong observability system answers key questions:

  • What errors are users experiencing?
  • Which releases introduced regressions?
  • Which routes or features are slow?
  • Which users or devices are affected?

The goal is not just collecting data, but enabling engineers to detect issues, diagnose root causes, and improve user experience.

Quick Decision Guide

Senior-Level Decision Guide:

- Use RUM to understand real user performance instead of relying only on lab tests. - Capture errors with release, route, and device context for faster debugging. - Track Core Web Vitals and custom performance metrics for user journeys. - Build dashboards that correlate regressions with deployments and user segments. - Use alerts to detect major performance or error spikes quickly.

Interview framing: Observability turns production behavior into actionable engineering insight.

Observability Mental Model

Observability typically combines three types of telemetry:

Metrics

Numerical measurements collected over time.

Examples:

•page load time
•error rate
•Core Web Vitals

Logs

Structured records of events.

Examples:

•JavaScript errors
•network failures
•feature usage events

Traces

Event chains representing user journeys.

Examples:

•route navigation
•API request flows

Together these signals provide a complete picture of application behavior.

RUM vs Synthetic Testing

Synthetic (lab) testing

Controlled tests run in simulated environments.

Examples:

•Lighthouse
•CI performance tests

Advantages:

•repeatable
•useful during development

Limitations:

•unrealistic network/device diversity

Real User Monitoring (RUM)

RUM collects telemetry from real users in production.

Examples of captured data:

•page load performance
•browser and device type
•network conditions
•geographic location

Best practice

Use lab testing for optimization and RUM for real-world validation.

Core Web Vitals in Observability

Core Web Vitals measure important aspects of user experience.

Key metrics:

•LCP (Largest Contentful Paint) – loading performance
•INP (Interaction to Next Paint) – responsiveness
•CLS (Cumulative Layout Shift) – visual stability

Monitoring these metrics in RUM helps teams detect real-world performance regressions.

Error Tracking

Frontend applications should capture runtime errors with contextual metadata.

Useful fields include:

•error message
•stack trace
•route or screen
•browser and device info
•release version

Why context matters

Without contextual information, errors are difficult to prioritize or reproduce.

Example context payload:

{
  "error": "TypeError",
  "route": "/checkout",
  "release": "v1.3.5",
  "browser": "Chrome"
}

Custom Performance Marks

Many important metrics are product-specific.

The browser Performance API allows developers to measure custom events.

Example:

performance.mark('search-start');

// perform operation

performance.mark('search-end');

performance.measure('search-duration', 'search-start', 'search-end');

Example use cases:

•time from search submit to results rendered
•time from route change to page interactive
•time from checkout click to payment UI ready

These metrics capture real product performance rather than generic page load time.

PerformanceObserver

The PerformanceObserver API allows applications to observe browser performance events.

Example:

const observer = new PerformanceObserver((list) => {
  list.getEntries().forEach(entry => {
    console.log(entry);
  });
});

observer.observe({ type: 'largest-contentful-paint', buffered: true });

This enables collecting metrics like:

•LCP
•CLS
•long tasks

These values can be sent to monitoring systems for analysis.

Release Correlation

Observability becomes powerful when telemetry is linked to deployment versions.

This allows engineers to answer questions like:

•Did performance regress after the latest release?
•Which build introduced an error spike?
•Which feature rollout caused increased latency?

Tagging telemetry with release identifiers enables faster root cause analysis.

Actionable Dashboards

A good observability dashboard answers key questions:

•what regressed
•after which deploy
•which users are impacted
•which route or feature is affected

Example dashboard signals

•error rate by route
•Core Web Vitals by device
•latency distribution by region

Dashboards should guide diagnosis rather than simply displaying metrics.

Alerting Strategy

Observability systems should trigger alerts when key thresholds are exceeded.

Examples:

•error rate spike
•performance regression
•sudden traffic anomalies

Alerts help teams respond quickly before issues affect large numbers of users.

Interview Scenarios

Scenario 1

Users report slow page loads after a deployment.

Solution:

Check RUM dashboards and compare Core Web Vitals across releases.

Scenario 2

An error occurs only on certain devices.

Solution:

Filter error telemetry by browser or device type.

Scenario 3

Checkout conversion drops after a new feature launch.

Solution:

Inspect custom performance marks around checkout flow.

Scenario 4

How do you detect performance regressions early?

Answer:

Monitor Core Web Vitals and set alerts for significant changes.

Scenario 5

Why is observability important for frontend?

Answer:

Because real-world conditions differ from development environments, and production telemetry reveals actual user experience.

Key Takeaways

1Frontend observability combines metrics, logs, and traces to understand production behavior.
2Real User Monitoring provides insight into real user experience across devices and networks.
3Core Web Vitals measure key aspects of performance and responsiveness.
4Custom performance marks capture product-specific performance metrics.
5Release correlation helps identify regressions introduced by deployments.
6Actionable dashboards and alerts enable fast detection and diagnosis of issues.