Verified by Garnet Grid

System Design Interview Patterns for Senior Engineers

Essential system design patterns for senior+ engineering roles. Covers load balancing, caching, database scaling, message queues, consensus, rate limiting, and how to structure your design process.

System design interviews test whether you can architect systems that scale. This guide covers the patterns that appear in 80% of interviews and real-world architecture decisions. Master these patterns and you can design most distributed systems from first principles. The difference between a good and great answer is depth — not covering more patterns, but knowing when to use each, what trade-offs they bring, and what breaks first at scale.

The #1 mistake in system design interviews: jumping to solutions before clarifying requirements. The best candidates spend 5 minutes asking questions before drawing a single box.


The Design Process

1. Requirements Clarification (5 min)
   - Functional requirements (what does the system DO?)
   - Non-functional requirements (scale, latency, availability)
   - Back-of-envelope calculations

2. High-Level Design (10 min)
   - Core components and their interactions
   - API design
   - Data flow

3. Deep Dive (15 min)
   - Database schema and scaling
   - Caching strategy
   - Handling failure modes

4. Trade-offs & Wrap-up (5 min)
   - What would you change at 10x scale?
   - What are the bottlenecks?
   - Monitoring and alerting

Requirements Questions to Always Ask

CategoryQuestionsWhy It Matters
ScaleHow many users? Read-heavy or write-heavy?Determines DB choice, caching strategy
LatencyWhat’s acceptable response time? Real-time or near-real-time?Determines caching, CDN, async processing
AvailabilityWhat’s the uptime target? Is eventual consistency OK?Determines replication, failover design
DataHow much data? How long to retain?Determines storage, partitioning, archival
BudgetSelf-managed or managed services? Cloud provider?Determines solution complexity

Pattern 1: Load Balancing

Clients → Load Balancer → Server Pool

    ┌─────────┼──────────┐
    ▼         ▼          ▼
  Server 1  Server 2  Server 3
AlgorithmBest ForTrade-off
Round robinStateless services, equal capacityDoesn’t account for server load
Least connectionsVariable request durationSlightly more overhead
IP hashSession affinity (stick to same server)Uneven distribution if IP distribution is skewed
Weighted round robinMixed server capacitiesRequires manual weight tuning
Consistent hashingDynamic cluster sizesComplex implementation

Layer 4 vs Layer 7 Load Balancing

AspectLayer 4 (Transport)Layer 7 (Application)
InspectsTCP/UDP packetsHTTP headers, body, URL
SpeedFaster (no payload inspection)Slower (parses HTTP)
RoutingIP + port onlyPath, header, cookie-based
Use caseTCP services, gaming, high throughputREST APIs, web apps, content-based routing
ExampleAWS NLB, HAProxy TCPAWS ALB, Nginx, Envoy

Pattern 2: Caching

Client → Cache Hit? → Yes → Return cached data

              No

         Query Database → Store in Cache → Return data
StrategyWrite PathRead PathConsistencyBest For
Cache-asideApp writes to DB, invalidates cacheApp checks cache firstEventually consistentGeneral purpose, read-heavy
Write-throughApp writes to cache + DB togetherApp reads from cacheStrongCritical data, small datasets
Write-behindApp writes to cache, async to DBApp reads from cacheEventually consistentWrite-heavy, can tolerate lag
Read-throughCache fetches from DB on miss (transparent)App reads from cacheEventually consistentSimplify application code

Cache Invalidation Strategies

StrategyHow It WorksConsistencyComplexity
TTL-basedSet expiry (60s, 5m, 1h)Eventually consistentLow
Event-basedInvalidate on write eventsNear-real-timeMedium
VersioningInclude version in cache keyStrong (new key = fresh data)Low
Write-throughUpdate cache on every writeStrongMedium

Rule of thumb: If data changes rarely (< 1x per minute), use TTL. If data changes frequently and staleness matters, use event-based invalidation.


Pattern 3: Database Scaling

Read Replicas

Writes → Primary DB → Replication → Read Replica 1
                                  → Read Replica 2
                                  → Read Replica 3
Reads  → Read Replicas (load balanced)

Sharding

User ID → Hash Function → Shard Assignment

Shard 0: Users 0-999,999    (Server A)
Shard 1: Users 1M-1.99M     (Server B)
Shard 2: Users 2M-2.99M     (Server C)

Database Scaling Decision Tree

ScenarioSolutionWhen
Read-heavy, single regionRead replicasRead:write ratio > 10:1
Write-heavy, single table bottleneckVertical scaling first, then shardingSingle table > 1 billion rows
Multi-region, low latencyMulti-region replicas + write routingUsers in 3+ regions
Time-series dataPartitioning by time rangeLogs, events, metrics
Mixed workloadsCQRS (separate read/write models)Reads and writes have different shapes

Pattern 4: Message Queues

Producer → Queue → Consumer(s)

Benefits:
- Decoupling (producer doesn't wait for consumer)
- Buffer (handle traffic spikes)
- Retry (failed messages go back to queue)
- Fan-out (one message, many consumers)
QueueBest ForThroughputOrderingDelivery Guarantee
KafkaHigh-throughput streaming, event sourcingVery high (millions/sec)Per-partitionAt-least-once
RabbitMQTask queues, complex routingHigh (100K/sec)Per-queueAt-least-once or exactly-once
SQSAWS-native, simple queueHighBest-effort (FIFO available)At-least-once
Redis StreamsLightweight, already using RedisHighPer-streamAt-least-once

Queue vs Stream vs Pub/Sub

PatternUse WhenExample
QueueOne consumer per message (task distribution)Order processing, email sending
StreamReplay, multiple consumers, event historyEvent sourcing, audit trail
Pub/SubReal-time fan-out, fire-and-forgetNotifications, real-time updates

Pattern 5: Rate Limiting

Request → Rate Limiter → Under limit? → Process

                           Over limit → 429 Too Many Requests
AlgorithmHow It WorksProsCons
Token bucketTokens refill at fixed rate, consumed per requestHandles bursts wellSlightly complex
Sliding windowCount requests in rolling time windowAccurate, no edge casesMemory-intensive
Fixed windowCount requests per fixed time windowSimpleBoundary burst problem
Leaky bucketRequests queue and drain at fixed rateSmooth outputNo burst handling

Pattern 6: Consistent Hashing

Used for distributing data across a cluster (cache servers, database shards).

Hash Ring:
     Server A
    /        \
Server D    Server B
    \        /
     Server C

Key "user:123" → hash → lands on Server B
Key "user:456" → hash → lands on Server D

When Server C is removed:
- Only keys that were on C need to move
- Other keys stay put (minimal disruption)

Why not simple modulo hashing? With modulo (key % N), adding or removing a server changes N and remaps almost every key. Consistent hashing only remaps ~1/N of the keys.


CAP Theorem Quick Reference

System TypeGuaranteesSacrificesExample
CP (Consistent + Partition-tolerant)Strong consistencyAvailability during partitionsMongoDB, HBase, ZooKeeper
AP (Available + Partition-tolerant)Always availableConsistency (eventual)Cassandra, DynamoDB, CouchDB
CA (Consistent + Available)Both C and ACannot handle network partitionsSingle-node RDBMS (PostgreSQL)

In practice, network partitions happen, so you’re choosing between CP and AP.


Back-of-Envelope Calculations

MetricValue
Read from memory100 ns
Read from SSD100 μs
Read from disk10 ms
Network round trip (same datacenter)0.5 ms
Network round trip (cross-country)50 ms
1 KB over 1 Gbps network10 μs

Quick Math Template

Daily active users: 10M
Requests per user per day: 10
Total requests/day: 100M
Requests/second: 100M / 86,400 ≈ 1,200 RPS
Peak (3x average): ~3,600 RPS

Storage per user: 1 KB
Total storage: 10M × 1 KB = 10 GB
With 3 years growth: 30 GB (fits on one machine)

Bandwidth: 1,200 RPS × 1 KB = 1.2 MB/s (trivial)

Common Systems to Practice

SystemKey PatternsFocus Areas
URL shortenerHashing, database design, caching, analyticsBase62 encoding, read-heavy optimization
Twitter/feedFan-out, timeline generation, caching, shardingPush vs pull fan-out, celebrity problem
Chat systemWebSockets, message queues, presence, storageDelivery guarantees, group messaging
Rate limiterToken bucket, Redis, distributed countingDistributed rate limiting across nodes
Notification systemQueue, fan-out, delivery guarantees, templatingPriority queues, retry with backoff
Search engineInverted index, ranking, crawling, cachingFull-text search, relevance scoring

Checklist

  • Requirements clarification framework memorized (scale, latency, availability, data, budget)
  • Core patterns practiced (load balancing, caching, queues, sharding, consistent hashing)
  • CAP theorem understood with real examples (CP vs AP trade-offs)
  • Back-of-envelope calculations comfortable (RPS, storage, bandwidth)
  • 5+ systems designed end-to-end with deep dives
  • Trade-off discussions polished (consistency vs latency, memory vs compute)
  • Failure mode analysis practiced per design (what breaks at 10x, 100x scale?)
  • Database scaling decision tree internalized (replicas → partitioning → sharding)

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For engineering leadership consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →