Kubernetes Cost Optimization

Kubernetes makes it easy to deploy. It also makes it easy to waste money. Default resource requests are either too conservative (wasting capacity) or too generous (not enough headroom). Without active optimization, Kubernetes clusters run at 30-40% utilization while billing for 100%.

Resource Right-Sizing

The Problem

# Typical over-provisioned deployment
resources:
  requests:
    cpu: "1000m"     # Requests 1 full CPU
    memory: "2Gi"    # Requests 2 GB RAM
  limits:
    cpu: "2000m"
    memory: "4Gi"

# Actual usage (from metrics):
# CPU: avg 50m, p99 200m
# Memory: avg 256Mi, peak 512Mi
# Waste: 80% CPU, 75% memory

Right-Sized Configuration

resources:
  requests:
    cpu: "200m"       # 2x P95 usage
    memory: "512Mi"   # 2x average usage
  limits:
    cpu: "500m"       # 2.5x P99 usage
    memory: "1Gi"     # 2x peak usage

Automated Right-Sizing

# Vertical Pod Autoscaler (VPA)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: order-service-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-service
  updatePolicy:
    updateMode: "Auto"      # Automatically applies recommendations
  resourcePolicy:
    containerPolicies:
      - containerName: order-service
        minAllowed:
          cpu: 50m
          memory: 128Mi
        maxAllowed:
          cpu: 2000m
          memory: 4Gi

Cluster Autoscaler Tuning

# Cluster autoscaler configuration
cluster-autoscaler:
  scale-down-delay-after-add: 10m     # Wait after scale-up before scaling down
  scale-down-unneeded-time: 10m       # Node must be unneeded for 10 min
  scale-down-utilization-threshold: 0.5  # Scale down if < 50% utilized
  max-graceful-termination-sec: 600   # Allow 10 min for pod eviction
  skip-nodes-with-system-pods: true   # Don't remove kube-system nodes
  
  # Node group priorities
  expander: priority                  # Prefer cheaper node types
  priority-config: |
    10:
      - spot-pool         # Prefer spot instances
    50:
      - on-demand-pool    # Fall back to on-demand

Spot/Preemptible Node Pools

# Mixed node pool strategy
node-pools:
  - name: system
    instance-type: m5.large
    type: on-demand
    count: 3
    taints: [CriticalAddonsOnly=true:NoSchedule]
    purpose: "Control plane, system workloads"
    
  - name: workload-spot
    instance-types: [m5.xlarge, m5a.xlarge, m5d.xlarge, m6i.xlarge]
    type: spot
    min: 2
    max: 20
    purpose: "Stateless workloads, workers"
    
  - name: workload-ondemand
    instance-type: m5.xlarge
    type: on-demand
    min: 2
    max: 10
    purpose: "Stateful workloads, databases"

Cost Visibility

Namespace-Level Cost Tracking

Namespace: order-team
  CPU Requests: 8 cores ($350/month)
  Memory Requests: 32 Gi ($200/month)
  Storage: 500 Gi ($50/month)
  Network Egress: 100 GB ($9/month)
  Total: $609/month
  
  Utilization: 45% CPU, 60% Memory
  Potential savings: $250/month with right-sizing

Tools

Tool	Type	Features
Kubecost	Open source	Real-time cost allocation, recommendations
OpenCost	CNCF project	Kubernetes cost monitoring standard
Spot.io	Commercial	Automated spot management
CAST AI	Commercial	Automated right-sizing and node optimization

Anti-Patterns

Anti-Pattern	Consequence	Fix
No resource requests	Scheduler cannot bin-pack, nodes underutilized	Set requests on every container
Requests == limits for CPU	No burst capacity, over-provisioned	CPU limits > requests (or no CPU limits)
One large node pool	Cannot mix spot and on-demand	Multiple pools by workload type
No namespace resource quotas	One team consumes all capacity	Enforce quotas per namespace
No cost visibility	Teams do not know their spend	Deploy Kubecost / OpenCost

Kubernetes cost optimization is continuous. Set up visibility, right-size resources, use spot instances for stateless workloads, and review regularly.