System Design Interview Patterns for Senior Engineers
Essential system design patterns for senior+ engineering roles. Covers load balancing, caching, database scaling, message queues, consensus, rate limiting, and how to structure your design process.
System design interviews test whether you can architect systems that scale. This guide covers the patterns that appear in 80% of interviews and real-world architecture decisions. Master these patterns and you can design most distributed systems from first principles. The difference between a good and great answer is depth — not covering more patterns, but knowing when to use each, what trade-offs they bring, and what breaks first at scale.
The #1 mistake in system design interviews: jumping to solutions before clarifying requirements. The best candidates spend 5 minutes asking questions before drawing a single box.
The Design Process
1. Requirements Clarification (5 min) - Functional requirements (what does the system DO?) - Non-functional requirements (scale, latency, availability) - Back-of-envelope calculations2. High-Level Design (10 min) - Core components and their interactions - API design - Data flow3. Deep Dive (15 min) - Database schema and scaling - Caching strategy - Handling failure modes4. Trade-offs & Wrap-up (5 min) - What would you change at 10x scale? - What are the bottlenecks? - Monitoring and alerting
Requirements Questions to Always Ask
Category
Questions
Why It Matters
Scale
How many users? Read-heavy or write-heavy?
Determines DB choice, caching strategy
Latency
What’s acceptable response time? Real-time or near-real-time?
Determines caching, CDN, async processing
Availability
What’s the uptime target? Is eventual consistency OK?
Determines replication, failover design
Data
How much data? How long to retain?
Determines storage, partitioning, archival
Budget
Self-managed or managed services? Cloud provider?
Determines solution complexity
Pattern 1: Load Balancing
Clients → Load Balancer → Server Pool │ ┌─────────┼──────────┐ ▼ ▼ ▼ Server 1 Server 2 Server 3
Algorithm
Best For
Trade-off
Round robin
Stateless services, equal capacity
Doesn’t account for server load
Least connections
Variable request duration
Slightly more overhead
IP hash
Session affinity (stick to same server)
Uneven distribution if IP distribution is skewed
Weighted round robin
Mixed server capacities
Requires manual weight tuning
Consistent hashing
Dynamic cluster sizes
Complex implementation
Layer 4 vs Layer 7 Load Balancing
Aspect
Layer 4 (Transport)
Layer 7 (Application)
Inspects
TCP/UDP packets
HTTP headers, body, URL
Speed
Faster (no payload inspection)
Slower (parses HTTP)
Routing
IP + port only
Path, header, cookie-based
Use case
TCP services, gaming, high throughput
REST APIs, web apps, content-based routing
Example
AWS NLB, HAProxy TCP
AWS ALB, Nginx, Envoy
Pattern 2: Caching
Client → Cache Hit? → Yes → Return cached data │ No ↓ Query Database → Store in Cache → Return data
Strategy
Write Path
Read Path
Consistency
Best For
Cache-aside
App writes to DB, invalidates cache
App checks cache first
Eventually consistent
General purpose, read-heavy
Write-through
App writes to cache + DB together
App reads from cache
Strong
Critical data, small datasets
Write-behind
App writes to cache, async to DB
App reads from cache
Eventually consistent
Write-heavy, can tolerate lag
Read-through
Cache fetches from DB on miss (transparent)
App reads from cache
Eventually consistent
Simplify application code
Cache Invalidation Strategies
Strategy
How It Works
Consistency
Complexity
TTL-based
Set expiry (60s, 5m, 1h)
Eventually consistent
Low
Event-based
Invalidate on write events
Near-real-time
Medium
Versioning
Include version in cache key
Strong (new key = fresh data)
Low
Write-through
Update cache on every write
Strong
Medium
Rule of thumb: If data changes rarely (< 1x per minute), use TTL. If data changes frequently and staleness matters, use event-based invalidation.
User ID → Hash Function → Shard AssignmentShard 0: Users 0-999,999 (Server A)Shard 1: Users 1M-1.99M (Server B)Shard 2: Users 2M-2.99M (Server C)
Database Scaling Decision Tree
Scenario
Solution
When
Read-heavy, single region
Read replicas
Read:write ratio > 10:1
Write-heavy, single table bottleneck
Vertical scaling first, then sharding
Single table > 1 billion rows
Multi-region, low latency
Multi-region replicas + write routing
Users in 3+ regions
Time-series data
Partitioning by time range
Logs, events, metrics
Mixed workloads
CQRS (separate read/write models)
Reads and writes have different shapes
Pattern 4: Message Queues
Producer → Queue → Consumer(s)Benefits:- Decoupling (producer doesn't wait for consumer)- Buffer (handle traffic spikes)- Retry (failed messages go back to queue)- Fan-out (one message, many consumers)
Queue
Best For
Throughput
Ordering
Delivery Guarantee
Kafka
High-throughput streaming, event sourcing
Very high (millions/sec)
Per-partition
At-least-once
RabbitMQ
Task queues, complex routing
High (100K/sec)
Per-queue
At-least-once or exactly-once
SQS
AWS-native, simple queue
High
Best-effort (FIFO available)
At-least-once
Redis Streams
Lightweight, already using Redis
High
Per-stream
At-least-once
Queue vs Stream vs Pub/Sub
Pattern
Use When
Example
Queue
One consumer per message (task distribution)
Order processing, email sending
Stream
Replay, multiple consumers, event history
Event sourcing, audit trail
Pub/Sub
Real-time fan-out, fire-and-forget
Notifications, real-time updates
Pattern 5: Rate Limiting
Request → Rate Limiter → Under limit? → Process │ Over limit → 429 Too Many Requests
Algorithm
How It Works
Pros
Cons
Token bucket
Tokens refill at fixed rate, consumed per request
Handles bursts well
Slightly complex
Sliding window
Count requests in rolling time window
Accurate, no edge cases
Memory-intensive
Fixed window
Count requests per fixed time window
Simple
Boundary burst problem
Leaky bucket
Requests queue and drain at fixed rate
Smooth output
No burst handling
Pattern 6: Consistent Hashing
Used for distributing data across a cluster (cache servers, database shards).
Hash Ring: Server A / \Server D Server B \ / Server CKey "user:123" → hash → lands on Server BKey "user:456" → hash → lands on Server DWhen Server C is removed:- Only keys that were on C need to move- Other keys stay put (minimal disruption)
Why not simple modulo hashing? With modulo (key % N), adding or removing a server changes N and remaps almost every key. Consistent hashing only remaps ~1/N of the keys.
CAP Theorem Quick Reference
System Type
Guarantees
Sacrifices
Example
CP (Consistent + Partition-tolerant)
Strong consistency
Availability during partitions
MongoDB, HBase, ZooKeeper
AP (Available + Partition-tolerant)
Always available
Consistency (eventual)
Cassandra, DynamoDB, CouchDB
CA (Consistent + Available)
Both C and A
Cannot handle network partitions
Single-node RDBMS (PostgreSQL)
In practice, network partitions happen, so you’re choosing between CP and AP.
Back-of-Envelope Calculations
Metric
Value
Read from memory
100 ns
Read from SSD
100 μs
Read from disk
10 ms
Network round trip (same datacenter)
0.5 ms
Network round trip (cross-country)
50 ms
1 KB over 1 Gbps network
10 μs
Quick Math Template
Daily active users: 10MRequests per user per day: 10Total requests/day: 100MRequests/second: 100M / 86,400 ≈ 1,200 RPSPeak (3x average): ~3,600 RPSStorage per user: 1 KBTotal storage: 10M × 1 KB = 10 GBWith 3 years growth: 30 GB (fits on one machine)Bandwidth: 1,200 RPS × 1 KB = 1.2 MB/s (trivial)
Trade-off discussions polished (consistency vs latency, memory vs compute)
Failure mode analysis practiced per design (what breaks at 10x, 100x scale?)
Database scaling decision tree internalized (replicas → partitioning → sharding)
:::note[Source]
This guide is derived from operational intelligence at Garnet Grid Consulting. For engineering leadership consulting, visit garnetgrid.com.
:::
Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.