System Design Interview Patterns for Senior Engineers

System design interviews test whether you can architect systems that scale. This guide covers the patterns that appear in 80% of interviews and real-world architecture decisions. Master these patterns and you can design most distributed systems from first principles. The difference between a good and great answer is depth — not covering more patterns, but knowing when to use each, what trade-offs they bring, and what breaks first at scale.

The #1 mistake in system design interviews: jumping to solutions before clarifying requirements. The best candidates spend 5 minutes asking questions before drawing a single box.

The Design Process

1. Requirements Clarification (5 min)
   - Functional requirements (what does the system DO?)
   - Non-functional requirements (scale, latency, availability)
   - Back-of-envelope calculations

2. High-Level Design (10 min)
   - Core components and their interactions
   - API design
   - Data flow

3. Deep Dive (15 min)
   - Database schema and scaling
   - Caching strategy
   - Handling failure modes

4. Trade-offs & Wrap-up (5 min)
   - What would you change at 10x scale?
   - What are the bottlenecks?
   - Monitoring and alerting

Requirements Questions to Always Ask

Category	Questions	Why It Matters
Scale	How many users? Read-heavy or write-heavy?	Determines DB choice, caching strategy
Latency	What’s acceptable response time? Real-time or near-real-time?	Determines caching, CDN, async processing
Availability	What’s the uptime target? Is eventual consistency OK?	Determines replication, failover design
Data	How much data? How long to retain?	Determines storage, partitioning, archival
Budget	Self-managed or managed services? Cloud provider?	Determines solution complexity

Pattern 1: Load Balancing

Clients → Load Balancer → Server Pool
              │
    ┌─────────┼──────────┐
    ▼         ▼          ▼
  Server 1  Server 2  Server 3

Algorithm	Best For	Trade-off
Round robin	Stateless services, equal capacity	Doesn’t account for server load
Least connections	Variable request duration	Slightly more overhead
IP hash	Session affinity (stick to same server)	Uneven distribution if IP distribution is skewed
Weighted round robin	Mixed server capacities	Requires manual weight tuning
Consistent hashing	Dynamic cluster sizes	Complex implementation

Layer 4 vs Layer 7 Load Balancing

Aspect	Layer 4 (Transport)	Layer 7 (Application)
Inspects	TCP/UDP packets	HTTP headers, body, URL
Speed	Faster (no payload inspection)	Slower (parses HTTP)
Routing	IP + port only	Path, header, cookie-based
Use case	TCP services, gaming, high throughput	REST APIs, web apps, content-based routing
Example	AWS NLB, HAProxy TCP	AWS ALB, Nginx, Envoy

Pattern 2: Caching

Client → Cache Hit? → Yes → Return cached data
              │
              No
              ↓
         Query Database → Store in Cache → Return data

Strategy	Write Path	Read Path	Consistency	Best For
Cache-aside	App writes to DB, invalidates cache	App checks cache first	Eventually consistent	General purpose, read-heavy
Write-through	App writes to cache + DB together	App reads from cache	Strong	Critical data, small datasets
Write-behind	App writes to cache, async to DB	App reads from cache	Eventually consistent	Write-heavy, can tolerate lag
Read-through	Cache fetches from DB on miss (transparent)	App reads from cache	Eventually consistent	Simplify application code

Cache Invalidation Strategies

Strategy	How It Works	Consistency	Complexity
TTL-based	Set expiry (60s, 5m, 1h)	Eventually consistent	Low
Event-based	Invalidate on write events	Near-real-time	Medium
Versioning	Include version in cache key	Strong (new key = fresh data)	Low
Write-through	Update cache on every write	Strong	Medium

Rule of thumb: If data changes rarely (< 1x per minute), use TTL. If data changes frequently and staleness matters, use event-based invalidation.

Pattern 3: Database Scaling

Read Replicas

Writes → Primary DB → Replication → Read Replica 1
                                  → Read Replica 2
                                  → Read Replica 3
Reads  → Read Replicas (load balanced)

Sharding

User ID → Hash Function → Shard Assignment

Shard 0: Users 0-999,999    (Server A)
Shard 1: Users 1M-1.99M     (Server B)
Shard 2: Users 2M-2.99M     (Server C)

Database Scaling Decision Tree

Scenario	Solution	When
Read-heavy, single region	Read replicas	Read:write ratio > 10:1
Write-heavy, single table bottleneck	Vertical scaling first, then sharding	Single table > 1 billion rows
Multi-region, low latency	Multi-region replicas + write routing	Users in 3+ regions
Time-series data	Partitioning by time range	Logs, events, metrics
Mixed workloads	CQRS (separate read/write models)	Reads and writes have different shapes

Pattern 4: Message Queues

Producer → Queue → Consumer(s)

Benefits:
- Decoupling (producer doesn't wait for consumer)
- Buffer (handle traffic spikes)
- Retry (failed messages go back to queue)
- Fan-out (one message, many consumers)

Queue	Best For	Throughput	Ordering	Delivery Guarantee
Kafka	High-throughput streaming, event sourcing	Very high (millions/sec)	Per-partition	At-least-once
RabbitMQ	Task queues, complex routing	High (100K/sec)	Per-queue	At-least-once or exactly-once
SQS	AWS-native, simple queue	High	Best-effort (FIFO available)	At-least-once
Redis Streams	Lightweight, already using Redis	High	Per-stream	At-least-once

Queue vs Stream vs Pub/Sub

Pattern	Use When	Example
Queue	One consumer per message (task distribution)	Order processing, email sending
Stream	Replay, multiple consumers, event history	Event sourcing, audit trail
Pub/Sub	Real-time fan-out, fire-and-forget	Notifications, real-time updates

Pattern 5: Rate Limiting

Request → Rate Limiter → Under limit? → Process
                              │
                           Over limit → 429 Too Many Requests

Algorithm	How It Works	Pros	Cons
Token bucket	Tokens refill at fixed rate, consumed per request	Handles bursts well	Slightly complex
Sliding window	Count requests in rolling time window	Accurate, no edge cases	Memory-intensive
Fixed window	Count requests per fixed time window	Simple	Boundary burst problem
Leaky bucket	Requests queue and drain at fixed rate	Smooth output	No burst handling

Pattern 6: Consistent Hashing

Used for distributing data across a cluster (cache servers, database shards).

Hash Ring:
     Server A
    /        \
Server D    Server B
    \        /
     Server C

Key "user:123" → hash → lands on Server B
Key "user:456" → hash → lands on Server D

When Server C is removed:
- Only keys that were on C need to move
- Other keys stay put (minimal disruption)

Why not simple modulo hashing? With modulo (key % N), adding or removing a server changes N and remaps almost every key. Consistent hashing only remaps ~1/N of the keys.

CAP Theorem Quick Reference

System Type	Guarantees	Sacrifices	Example
CP (Consistent + Partition-tolerant)	Strong consistency	Availability during partitions	MongoDB, HBase, ZooKeeper
AP (Available + Partition-tolerant)	Always available	Consistency (eventual)	Cassandra, DynamoDB, CouchDB
CA (Consistent + Available)	Both C and A	Cannot handle network partitions	Single-node RDBMS (PostgreSQL)

In practice, network partitions happen, so you’re choosing between CP and AP.

Back-of-Envelope Calculations

Metric	Value
Read from memory	100 ns
Read from SSD	100 μs
Read from disk	10 ms
Network round trip (same datacenter)	0.5 ms
Network round trip (cross-country)	50 ms
1 KB over 1 Gbps network	10 μs

Quick Math Template

Daily active users: 10M
Requests per user per day: 10
Total requests/day: 100M
Requests/second: 100M / 86,400 ≈ 1,200 RPS
Peak (3x average): ~3,600 RPS

Storage per user: 1 KB
Total storage: 10M × 1 KB = 10 GB
With 3 years growth: 30 GB (fits on one machine)

Bandwidth: 1,200 RPS × 1 KB = 1.2 MB/s (trivial)

Common Systems to Practice

System	Key Patterns	Focus Areas
URL shortener	Hashing, database design, caching, analytics	Base62 encoding, read-heavy optimization
Twitter/feed	Fan-out, timeline generation, caching, sharding	Push vs pull fan-out, celebrity problem
Chat system	WebSockets, message queues, presence, storage	Delivery guarantees, group messaging
Rate limiter	Token bucket, Redis, distributed counting	Distributed rate limiting across nodes
Notification system	Queue, fan-out, delivery guarantees, templating	Priority queues, retry with backoff
Search engine	Inverted index, ranking, crawling, caching	Full-text search, relevance scoring

Checklist

Requirements clarification framework memorized (scale, latency, availability, data, budget)
Core patterns practiced (load balancing, caching, queues, sharding, consistent hashing)
CAP theorem understood with real examples (CP vs AP trade-offs)
Back-of-envelope calculations comfortable (RPS, storage, bandwidth)
5+ systems designed end-to-end with deep dives
Trade-off discussions polished (consistency vs latency, memory vs compute)
Failure mode analysis practiced per design (what breaks at 10x, 100x scale?)
Database scaling decision tree internalized (replicas → partitioning → sharding)

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For engineering leadership consulting, visit garnetgrid.com. :::