ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Capacity Planning: Scaling Infrastructure Before You Need To

Predict and provision infrastructure capacity before demand outpaces supply. Covers load modeling, bottleneck identification, scaling strategies, cost-capacity tradeoffs, and the planning process that prevents both outages and over-provisioning.

Capacity planning is the discipline of having enough infrastructure to handle your traffic — tomorrow, next month, and during the annual spike — without paying for resources you do not need today. It sits at the intersection of engineering and finance: too little capacity causes outages, too much wastes money.

Most teams do capacity planning reactively: they add more servers after an outage. This guide covers how to plan proactively so you never have the “we ran out of capacity” conversation with your CEO.


The Capacity Planning Process

1. Understand current usage
   "We serve 10,000 requests/second with 40% CPU utilization
    on 20 servers. Database handles 5,000 queries/second."

2. Model growth
   "Traffic grows 15% month-over-month. Black Friday is 3× normal.
    Marketing campaign in October expected to add 25%."

3. Identify bottlenecks
   "At 15,000 req/s, the database connection pool saturates.
    At 20,000 req/s, we run out of API server CPU."

4. Plan capacity additions
   "Need 10 more API servers by October.
    Need database upgrade (or read replicas) by November."

5. Budget and approve
   "Additional infrastructure costs $X/month.
    Prevents an outage that costs $Y/hour."

6. Execute and verify
   "Deploy additional capacity. Load test to verify."

Resource Utilization Tracking

ResourceMeasureHealthy RangeDanger Zone
CPUAverage and p99 utilization40-60% average> 80% sustained
MemoryUsed / Available50-70%> 85%
Disk I/OIOPS and throughput< 60% of provisioned> 80%
NetworkBandwidth utilization< 50% of link capacity> 70%
Database connectionsActive / Max< 60% of pool> 80% of pool
Queue depthMessages waiting< 100 messagesGrowing consistently
Utilization over time — identify the trend:

100% ────────────────────────── Capacity limit
 90% ─────────────────────╱──── Danger zone
 80% ──────────────────╱──────
 70% ───────────────╱─────────
 60% ────────────╱────────────  ← Current utilization trend
 50% ─────────╱───────────────
 40% ──────╱──────────────────
 30% ───╱─────────────────────
  0% ╱────────────────────────
     Jan  Feb  Mar  Apr  May  Jun  Jul  Aug

At 15% monthly growth:
  Jan: 40% → Apr: 65% → Jul: 100% (outage)

  Action required by May (60%) to have capacity
  available by June (70%).

  Lead time for new infrastructure: 2-4 weeks
  Therefore: start procurement in April.

Load Modeling

Calculating Capacity Requirements

Current state:
  10,000 requests/second
  20 servers
  40% CPU utilization per server
  → Each server handles 500 req/s at 40% CPU
  → Each server max capacity ≈ 1,250 req/s (at 100% CPU)
  → Comfortable capacity (at 70% CPU): 875 req/s per server

Target state (Black Friday, 3× traffic):
  30,000 requests/second
  At 875 req/s per server (70% target utilization):
  → Need 30,000 / 875 = 35 servers
  → Currently have 20
  → Need 15 additional servers

  Plus buffer (20% for unexpected spikes):
  → 35 × 1.2 = 42 servers total
  → Need 22 additional servers

The Bottleneck Hierarchy

Typical bottleneck order (what breaks first):

  1. Database connections (pool exhaustion)
  2. Database query latency (CPU or I/O bound)
  3. Application server memory (GC pressure, memory leaks)
  4. Application server CPU (compute-bound workloads)
  5. Network bandwidth (large payloads, media serving)
  6. External API rate limits (third-party dependencies)
  7. DNS resolution (often overlooked, cache TTL issues)

Start investigation at #1 and work down.
Each bottleneck has a different scaling strategy.

Scaling Strategies

StrategyBest ForLimitation
Vertical scaling (bigger instance)Database, single-threaded workloadsPhysical limits, expensive at top tier
Horizontal scaling (more instances)Stateless services, web serversRequires stateless design
CachingRead-heavy workloadsCache invalidation complexity
Read replicasRead-heavy database workloadsReplication lag
CDNStatic assets, mediaDynamic content not cacheable
Async processingBackground jobs, batch operationsIncreased system complexity
ShardingVery large datasetsApplication complexity, cross-shard queries

Auto-Scaling Configuration

# Kubernetes HPA (Horizontal Pod Autoscaler)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: checkout-api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: checkout-api
  minReplicas: 5        # Never go below 5 (night time traffic)
  maxReplicas: 50       # Never go above 50 (budget control)
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60  # Scale up when CPU > 60%
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "500"      # Scale when > 500 req/s per pod
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60   # Wait 1 min before scaling up
      policies:
        - type: Percent
          value: 50                     # Add up to 50% more pods
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 10                     # Remove max 10% at a time
          periodSeconds: 120

Implementation Checklist

  • Track utilization for all critical resources: CPU, memory, disk, network, connections
  • Model traffic growth: monthly growth rate + known events (campaigns, launches)
  • Identify your bottleneck hierarchy: what breaks first as traffic increases?
  • Set utilization thresholds: alert at 70%, plan at 60% sustained
  • Calculate capacity runway: “At current growth, we hit limits in N weeks”
  • Configure auto-scaling for stateless services with sensible min/max limits
  • Slow down scale-down (5 min stabilization) to prevent oscillation
  • Load test quarterly at 2× current peak traffic
  • Budget capacity additions quarterly with 20% buffer
  • Document capacity assumptions and review them when architecture changes
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →