ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

DNS Engineering

Master DNS as a critical infrastructure component. Covers DNS architecture, caching, security extensions (DNSSEC), split-horizon DNS, DNS-based service discovery, failover patterns, and the DNS problems that cause the most outages.

DNS is the most critical infrastructure component that most engineers take for granted. Every API call, every page load, every microservice request starts with a DNS lookup. When DNS breaks, everything breaks. Understanding DNS deeply — its caching layers, failure modes, and security implications — is essential for building reliable systems.


DNS Resolution Path

Application → Local Cache → OS Resolver → Recursive Resolver → Root → TLD → Authoritative
                                                    (ISP/8.8.8.8)

1. App calls getaddrinfo("api.example.com")
2. Check local DNS cache (browser, OS)
3. OS resolver sends query to configured recursive resolver
4. Recursive resolver checks its cache
5. If not cached: Root → .com TLD → example.com authoritative
6. Authoritative responds with IP address
7. Every layer caches the result for TTL duration

Record Types

TypePurposeExample
AIPv4 addressapi.example.com → 192.168.1.100
AAAAIPv6 addressapi.example.com → 2001:db8::1
CNAMEAlias to another namewww → api.example.com
MXMail serverexample.com → mail.example.com
TXTArbitrary textSPF, DKIM, domain verification
NSNameserver delegationexample.com → ns1.provider.com
SRVService location_sip._tcp.example.com → sip.example.com:5060
CAACertificate authority authexample.com → letsencrypt.org

TTL Strategy

Record Type          Recommended TTL    Rationale
─────────────────    ───────────────    ──────────
Production A/AAAA    300s (5 min)       Balance between caching and failover speed
CDN CNAME            3600s (1 hr)       CDN handles its own failover
MX records           3600s (1 hr)       Mail routing changes rarely
TXT (SPF/DKIM)       3600s (1 hr)       Email auth changes rarely
NS records           86400s (24 hr)     Nameserver changes are planned
Pre-migration        60s (1 min)        Lower before DNS changes, raise after

TTL Before Migration

Week before migration:  Lower TTL to 60s
Day of migration:       Change DNS records
Post-migration:         Verify, then raise TTL to normal

DNS-Based Load Balancing

Round-Robin

api.example.com.  300  IN  A  192.168.1.100
api.example.com.  300  IN  A  192.168.1.101
api.example.com.  300  IN  A  192.168.1.102

Simple distribution, no health awareness.

Weighted Routing

# Route 53 weighted routing
api.example.com:
  - record: 192.168.1.100
    weight: 70    # Primary
  - record: 192.168.1.101
    weight: 20    # Secondary
  - record: 192.168.1.102
    weight: 10    # Canary

Geolocation Routing

api.example.com:
  - region: us-east-1
    target: us-east.api.example.com
  - region: eu-west-1
    target: eu-west.api.example.com
  - region: ap-southeast-1
    target: ap-southeast.api.example.com
  - default:
    target: us-east.api.example.com

DNS Security

DNSSEC

DNSSEC adds cryptographic signatures to DNS responses, preventing cache poisoning and man-in-the-middle attacks:

Without DNSSEC:
  Attacker can forge DNS responses → redirect traffic to malicious server

With DNSSEC:
  Resolver validates signature chain → forged responses are rejected

DNS over HTTPS (DoH) / DNS over TLS (DoT)

Traditional DNS:  Plaintext UDP on port 53 (anyone can see queries)
DoT:              Encrypted DNS over TLS on port 853
DoH:              Encrypted DNS over HTTPS on port 443

Common DNS Problems

ProblemSymptomFix
TTL too high during migrationOld IP served for hoursLower TTL days before change
CNAME at zone apexDoesn’t work per RFCUse ALIAS/ANAME record or A record
Too many CNAME chainsSlow resolution, timeoutsFlatten CNAME chains
No CAA recordsAny CA can issue certificatesAdd CAA restricting to your CA
NS delegation mismatchPartial resolution failuresEnsure NS records match at registrar and zone

Anti-Patterns

Anti-PatternConsequenceFix
Hardcoded IP addressesCannot change infrastructure without code changeUse DNS names everywhere
TTL of 86400s on active records24-hour failover time300s for production records
Single DNS providerDNS provider outage = total outageMulti-provider DNS
No DNS monitoringResolution failures go undetectedMonitor query success rate and latency
Ignoring negative cachingNXDOMAIN cached, new records invisibleCheck SOA minimum TTL (negative cache TTL)

DNS is deceptively simple on the surface and deeply complex underneath. Most “network issues” are actually DNS issues. Most “slow connections” start with slow DNS. Understanding DNS at depth prevents entire categories of outages.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →