Kubernetes Cost Optimization
Reduce Kubernetes cluster costs without sacrificing reliability. Covers right-sizing pods, cluster autoscaler tuning, multi-tenancy, spot node pools, resource quotas, and the cost visibility tools purpose-built for Kubernetes.
Kubernetes makes it easy to deploy. It also makes it easy to waste money. Default resource requests are either too conservative (wasting capacity) or too generous (not enough headroom). Without active optimization, Kubernetes clusters run at 30-40% utilization while billing for 100%.
Resource Right-Sizing
The Problem
# Typical over-provisioned deployment
resources:
requests:
cpu: "1000m" # Requests 1 full CPU
memory: "2Gi" # Requests 2 GB RAM
limits:
cpu: "2000m"
memory: "4Gi"
# Actual usage (from metrics):
# CPU: avg 50m, p99 200m
# Memory: avg 256Mi, peak 512Mi
# Waste: 80% CPU, 75% memory
Right-Sized Configuration
resources:
requests:
cpu: "200m" # 2x P95 usage
memory: "512Mi" # 2x average usage
limits:
cpu: "500m" # 2.5x P99 usage
memory: "1Gi" # 2x peak usage
Automated Right-Sizing
# Vertical Pod Autoscaler (VPA)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: order-service-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
updatePolicy:
updateMode: "Auto" # Automatically applies recommendations
resourcePolicy:
containerPolicies:
- containerName: order-service
minAllowed:
cpu: 50m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
Cluster Autoscaler Tuning
# Cluster autoscaler configuration
cluster-autoscaler:
scale-down-delay-after-add: 10m # Wait after scale-up before scaling down
scale-down-unneeded-time: 10m # Node must be unneeded for 10 min
scale-down-utilization-threshold: 0.5 # Scale down if < 50% utilized
max-graceful-termination-sec: 600 # Allow 10 min for pod eviction
skip-nodes-with-system-pods: true # Don't remove kube-system nodes
# Node group priorities
expander: priority # Prefer cheaper node types
priority-config: |
10:
- spot-pool # Prefer spot instances
50:
- on-demand-pool # Fall back to on-demand
Spot/Preemptible Node Pools
# Mixed node pool strategy
node-pools:
- name: system
instance-type: m5.large
type: on-demand
count: 3
taints: [CriticalAddonsOnly=true:NoSchedule]
purpose: "Control plane, system workloads"
- name: workload-spot
instance-types: [m5.xlarge, m5a.xlarge, m5d.xlarge, m6i.xlarge]
type: spot
min: 2
max: 20
purpose: "Stateless workloads, workers"
- name: workload-ondemand
instance-type: m5.xlarge
type: on-demand
min: 2
max: 10
purpose: "Stateful workloads, databases"
Cost Visibility
Namespace-Level Cost Tracking
Namespace: order-team
CPU Requests: 8 cores ($350/month)
Memory Requests: 32 Gi ($200/month)
Storage: 500 Gi ($50/month)
Network Egress: 100 GB ($9/month)
Total: $609/month
Utilization: 45% CPU, 60% Memory
Potential savings: $250/month with right-sizing
Tools
| Tool | Type | Features |
|---|---|---|
| Kubecost | Open source | Real-time cost allocation, recommendations |
| OpenCost | CNCF project | Kubernetes cost monitoring standard |
| Spot.io | Commercial | Automated spot management |
| CAST AI | Commercial | Automated right-sizing and node optimization |
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| No resource requests | Scheduler cannot bin-pack, nodes underutilized | Set requests on every container |
| Requests == limits for CPU | No burst capacity, over-provisioned | CPU limits > requests (or no CPU limits) |
| One large node pool | Cannot mix spot and on-demand | Multiple pools by workload type |
| No namespace resource quotas | One team consumes all capacity | Enforce quotas per namespace |
| No cost visibility | Teams do not know their spend | Deploy Kubecost / OpenCost |
Kubernetes cost optimization is continuous. Set up visibility, right-size resources, use spot instances for stateless workloads, and review regularly.