DNS Deep Dive: Resolution Path, TTL, and Cache Behavior
Resolution path
Recursive lookup chain
- - Client -> resolver -> authority
- - Latency on cold lookups
Record strategy
A/AAAA/CNAME choices
- - Migration flexibility
- - Operational clarity
TTL policy
Agility vs cache hit rate
- - Short TTL for failover
- - Long TTL for efficiency
Incident behavior
Cache propagation reality
- - Stale resolver state
- - Non-instant rollback effect
Core Lens
DNS is a distributed cache hierarchy; TTL policy controls agility versus stability.
Flow
DNS is best understood as a distributed lookup system with layered caches. A configuration change made at the authoritative server does not instantly reach every client. Instead, recursive resolvers and local caches continue serving cached responses until TTL expiration. Strong engineers understand both the control-plane view (records and TTL settings) and the real data-plane behavior seen by users across different networks and resolvers.
Quick Navigation: Resolution Path • Record Types for Frontend Systems • TTL Strategy • Caching Layers • Negative Caching • Multi-Region Routing • Failure Modes • TTL Strategy and Change Windows
Quick Decision Guide
Senior-Level Decision Guide:
- DNS lookup latency is part of the request critical path for uncached hosts. - TTL controls how quickly changes propagate versus how efficiently caches behave. - DNS responses are cached at several layers: browser, OS, recursive resolver, and sometimes intermediary networks. - Incident recovery must account for stale resolver caches and negative caching behavior. - DNS design strongly influences multi-region routing and CDN availability.
Resolution Path
A DNS lookup usually follows this sequence:
Client → OS DNS cache → Recursive resolver → Root nameserver → TLD nameserver → Authoritative nameserver → Response cached
Simplified Flow
1. The browser asks the operating system for the IP address of a domain.
2. The OS checks its local DNS cache.
3. If not cached, the request goes to a recursive resolver (often provided by the ISP or a public resolver).
4. The resolver walks the DNS hierarchy until it reaches the authoritative nameserver.
5. The answer is returned and cached along the path.
This layered caching means most requests never reach the authoritative server once records are widely cached.
Record Types for Frontend Systems
Common record types used in frontend architectures include:
A / AAAA Records
Map hostnames to IPv4 or IPv6 addresses.
Example:
api.example.com → 203.0.113.1
CNAME Records
Alias one domain name to another.
Example:
assets.example.com → d123.cloudfront.net
This pattern is extremely common when pointing frontend assets or APIs to CDN endpoints.
TXT Records
Used for verification, security policies, or configuration metadata.
Examples include domain verification or email security records.
Important Operational Detail
CNAME chains introduce additional lookups, which can add small latency if not cached.
TTL Strategy
TTL (Time To Live) determines how long resolvers cache a DNS record.
Short TTL
Example: 30–60 seconds
Advantages:
Disadvantages:
Long TTL
Example: 1–24 hours
Advantages:
Disadvantages:
The correct TTL is usually a balance between operational agility and cache efficiency.
Caching Layers
DNS caching happens in multiple places.
Browser Cache
Browsers may cache DNS results for a short period depending on implementation.
Operating System Cache
The OS maintains a DNS cache used by applications.
Recursive Resolver Cache
Public or ISP resolvers cache DNS results according to TTL.
Network Infrastructure
Some enterprise networks add additional DNS caching layers.
Because of these layers, DNS changes propagate gradually rather than instantly.
Negative Caching
When a resolver receives a response that a record does not exist (NXDOMAIN), it may cache that negative result.
This behavior means that if a record is added shortly after an NXDOMAIN response, some resolvers may still temporarily believe the domain does not exist.
Negative caching TTLs are defined in the DNS zone's SOA configuration and can significantly affect rollout timing.
Multi-Region Routing
DNS is often used as a first layer of traffic steering.
Examples include:
Example architecture:
api.example.com → region-specific load balancer
Resolvers may receive different answers depending on geographic location or health checks.
DNS steering is commonly used alongside CDNs or global load balancers.
Failure Modes
DNS incidents often behave differently from application outages.
Common scenarios include:
Stale Resolver Cache
Clients continue using old IP addresses after a migration.
Misconfigured Records
Incorrect records can break traffic even when infrastructure is healthy.
Resolver Differences
Different recursive resolvers may cache or refresh records differently.
DNS Provider Outage
Even healthy origins become unreachable if DNS resolution fails.
TTL Strategy and Change Windows
Before large migrations or traffic shifts, teams often lower TTL values ahead of time.
Typical migration workflow:
1. Lower TTL several hours before a change.
2. Wait for old caches to expire.
3. Perform the infrastructure migration.
4. Monitor behavior and rollback safety.
5. Restore normal TTL values after stabilization.
Failure and Recovery
DNS recovery often takes longer than expected because cached responses must expire before users see the correction.
Recovery planning should consider:
Operational playbooks often include fallback endpoints or alternate domains to mitigate long DNS propagation delays.
Interview Deep Dive
In system design interviews, strong answers mention DNS as a dependency for CDN-backed assets and API requests.
A good answer may include:
Staff-level reasoning often connects DNS configuration to availability, performance, and failover design.