Kubernetes Cost Optimization
Reduce Kubernetes costs without sacrificing reliability. Covers resource right-sizing, spot instances, cluster autoscaling, namespace quotas, cost allocation, and workload scheduling.
Kubernetes makes it easy to over-provision. Every team requests 4 CPUs and 8 GB RAM “just in case,” but actual usage averages 0.3 CPUs and 500 MB. Across 200 services, that waste adds up to hundreds of thousands of dollars per year. Kubernetes cost optimization is about matching resources to actual usage without causing outages.
Where Kubernetes Money Goes
| Cost Component | Typical % | Optimization Lever |
|---|---|---|
| Compute (nodes) | 60-70% | Right-size pods, use spot/preemptible, autoscale |
| Storage (PVCs) | 10-15% | Right-size volumes, lifecycle policies |
| Networking (LB, NAT) | 10-15% | Reduce cross-AZ traffic, internal LBs |
| Control plane | 5-10% | Managed K8s pricing varies by provider |
Resource Right-Sizing
# BEFORE: Over-provisioned (common default)
resources:
requests:
cpu: "2000m" # Requesting 2 full CPUs
memory: "4Gi" # Requesting 4 GB
limits:
cpu: "4000m"
memory: "8Gi"
# AFTER: Right-sized based on actual usage data
resources:
requests:
cpu: "250m" # Actual p95 usage: 200m
memory: "512Mi" # Actual p95 usage: 400Mi
limits:
cpu: "500m" # 2x headroom for bursts
memory: "1Gi" # 2x headroom for spikes
Finding Right Sizes
# Check actual CPU/memory usage vs requests
kubectl top pods -n production --sort-by=cpu
# Use VPA recommendations
kubectl get vpa -n production -o yaml
# Look for: recommendation.containerRecommendations
Spot/Preemptible Instances
| Workload Type | Spot Safe? | Strategy |
|---|---|---|
| Stateless web services | ✅ Yes | Multiple replicas, PodDisruptionBudget |
| Batch/ML training | ✅ Yes | Checkpointing, retry on preemption |
| Stateful databases | ❌ No | On-demand instances only |
| CI/CD runners | ✅ Yes | Re-run jobs on preemption |
| Cron jobs | ✅ Yes | Retry policy |
# Node pool with spot instances (GKE)
apiVersion: container.v1.gke.io/v1
kind: NodePool
spec:
config:
spot: true
machineType: e2-standard-4
autoscaling:
minNodeCount: 0
maxNodeCount: 20
management:
autoRepair: true
Cluster Autoscaler Settings
# Aggressive scale-down for cost savings
cluster-autoscaler:
scale-down-enabled: true
scale-down-delay-after-add: 5m # Wait 5 min after scaling up
scale-down-unneeded-time: 5m # Node unused for 5 min → remove
scale-down-utilization-threshold: 0.5 # < 50% utilized → candidate
# Prevent thrashing
max-node-provision-time: 15m
skip-nodes-with-local-storage: false
skip-nodes-with-system-pods: true
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| No resource requests | Scheduler can’t bin-pack, nodes underutilized | Set requests on every container |
| Requests = limits (always) | No burst capacity, need more nodes | Requests at p95 usage, limits at 2x |
| All on-demand nodes | Paying full price for interruptible workloads | Spot nodes for stateless workloads |
| No namespace quotas | One team consumes all resources | ResourceQuotas per namespace |
| No cost visibility | Nobody knows which team spends what | Cost allocation labels on all resources |
Checklist
- Resource requests set on every pod (based on actual usage)
- Limits set at 2x requests for burst headroom
- VPA deployed for right-sizing recommendations
- Spot/preemptible nodes for stateless workloads (40-60% savings)
- Cluster autoscaler configured with aggressive scale-down
- Namespace quotas for resource governance
- Cost allocation labels: team, service, environment
- Cost monitoring dashboard (Kubecost, OpenCost)
- Monthly cost review with team owners
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For Kubernetes cost optimization, visit garnetgrid.com. :::