DNS Deep Dive: Resolution Path, TTL, and Cache Behavior

Medium•

Resolution path

Recursive lookup chain

  • - Client -> resolver -> authority
  • - Latency on cold lookups

Record strategy

A/AAAA/CNAME choices

  • - Migration flexibility
  • - Operational clarity

TTL policy

Agility vs cache hit rate

  • - Short TTL for failover
  • - Long TTL for efficiency

Incident behavior

Cache propagation reality

  • - Stale resolver state
  • - Non-instant rollback effect

Core Lens

DNS is a distributed cache hierarchy; TTL policy controls agility versus stability.

Flow

Lookup→
Resolver chain→
Cache hit/miss→
TTL expiry/refresh

DNS is best understood as a distributed lookup system with layered caches. A configuration change made at the authoritative server does not instantly reach every client. Instead, recursive resolvers and local caches continue serving cached responses until TTL expiration. Strong engineers understand both the control-plane view (records and TTL settings) and the real data-plane behavior seen by users across different networks and resolvers.

Quick Decision Guide

Senior-Level Decision Guide:

- DNS lookup latency is part of the request critical path for uncached hosts. - TTL controls how quickly changes propagate versus how efficiently caches behave. - DNS responses are cached at several layers: browser, OS, recursive resolver, and sometimes intermediary networks. - Incident recovery must account for stale resolver caches and negative caching behavior. - DNS design strongly influences multi-region routing and CDN availability.

Resolution Path

A DNS lookup usually follows this sequence:

Client → OS DNS cache → Recursive resolver → Root nameserver → TLD nameserver → Authoritative nameserver → Response cached

Simplified Flow

1. The browser asks the operating system for the IP address of a domain.

2. The OS checks its local DNS cache.

3. If not cached, the request goes to a recursive resolver (often provided by the ISP or a public resolver).

4. The resolver walks the DNS hierarchy until it reaches the authoritative nameserver.

5. The answer is returned and cached along the path.

This layered caching means most requests never reach the authoritative server once records are widely cached.

Record Types for Frontend Systems

Common record types used in frontend architectures include:

A / AAAA Records

Map hostnames to IPv4 or IPv6 addresses.

Example:

api.example.com → 203.0.113.1

CNAME Records

Alias one domain name to another.

Example:

assets.example.com → d123.cloudfront.net

This pattern is extremely common when pointing frontend assets or APIs to CDN endpoints.

TXT Records

Used for verification, security policies, or configuration metadata.

Examples include domain verification or email security records.

Important Operational Detail

CNAME chains introduce additional lookups, which can add small latency if not cached.

TTL Strategy

TTL (Time To Live) determines how long resolvers cache a DNS record.

Short TTL

Example: 30–60 seconds

Advantages:

•faster failover
•faster configuration rollback
•faster migrations

Disadvantages:

•higher query load on authoritative DNS
•slightly more lookup overhead

Long TTL

Example: 1–24 hours

Advantages:

•higher cache hit rates
•lower resolver load

Disadvantages:

•slower propagation of configuration changes
•slower incident mitigation if records need to change

The correct TTL is usually a balance between operational agility and cache efficiency.

Caching Layers

DNS caching happens in multiple places.

Browser Cache

Browsers may cache DNS results for a short period depending on implementation.

Operating System Cache

The OS maintains a DNS cache used by applications.

Recursive Resolver Cache

Public or ISP resolvers cache DNS results according to TTL.

Network Infrastructure

Some enterprise networks add additional DNS caching layers.

Because of these layers, DNS changes propagate gradually rather than instantly.

Negative Caching

When a resolver receives a response that a record does not exist (NXDOMAIN), it may cache that negative result.

This behavior means that if a record is added shortly after an NXDOMAIN response, some resolvers may still temporarily believe the domain does not exist.

Negative caching TTLs are defined in the DNS zone's SOA configuration and can significantly affect rollout timing.

Multi-Region Routing

DNS is often used as a first layer of traffic steering.

Examples include:

•geo-based routing
•latency-based routing
•failover routing

Example architecture:

api.example.com → region-specific load balancer

Resolvers may receive different answers depending on geographic location or health checks.

DNS steering is commonly used alongside CDNs or global load balancers.

Failure Modes

DNS incidents often behave differently from application outages.

Common scenarios include:

Stale Resolver Cache

Clients continue using old IP addresses after a migration.

Misconfigured Records

Incorrect records can break traffic even when infrastructure is healthy.

Resolver Differences

Different recursive resolvers may cache or refresh records differently.

DNS Provider Outage

Even healthy origins become unreachable if DNS resolution fails.

TTL Strategy and Change Windows

Before large migrations or traffic shifts, teams often lower TTL values ahead of time.

Typical migration workflow:

1. Lower TTL several hours before a change.

2. Wait for old caches to expire.

3. Perform the infrastructure migration.

4. Monitor behavior and rollback safety.

5. Restore normal TTL values after stabilization.

Failure and Recovery

DNS recovery often takes longer than expected because cached responses must expire before users see the correction.

Recovery planning should consider:

•stale cache duration
•resolver behavior differences
•TTL expiration timelines
•negative cache responses

Operational playbooks often include fallback endpoints or alternate domains to mitigate long DNS propagation delays.

Interview Deep Dive

In system design interviews, strong answers mention DNS as a dependency for CDN-backed assets and API requests.

A good answer may include:

•how DNS affects cold-start latency
•TTL tradeoffs during migrations
•resolver cache behavior during incidents
•DNS routing strategies for multi-region systems

Staff-level reasoning often connects DNS configuration to availability, performance, and failover design.

Key Takeaways

1DNS is a distributed cache hierarchy rather than a simple lookup service.
2DNS resolution sits on the request path for uncached hosts.
3TTL determines the balance between propagation speed and cache efficiency.
4Multiple caching layers shape real DNS behavior.
5Negative caching can delay recovery after record creation.
6DNS routing can be used for multi-region traffic steering.
7Stale resolver caches often prolong DNS incidents.
8Staff-level system design answers connect DNS policy to latency and resilience.