How to Control Kubernetes Costs: Resource Limits, Autoscaling, and Spot Nodes
Reduce Kubernetes infrastructure costs by 40-65% with proper resource requests, cluster autoscaling, spot node pools, and namespace quotas.
Kubernetes makes it trivially easy to waste money. Every pod without resource limits is an open checkbook. Every idle node is a bill you’re paying for nothing. The average Kubernetes cluster runs at 20-35% utilization — meaning 65-80% of your compute spend is pure waste. This guide shows you exactly how to find that waste and eliminate it, typically saving 40-65% of cluster costs within the first month.
The cost problem isn’t Kubernetes itself — it’s the gap between what you provision and what your workloads actually use. Close that gap and you’ll save thousands per month without degrading performance.
Step 1: Set Resource Requests and Limits on Every Pod
The single most important cost control mechanism in Kubernetes. Without resource requests, the scheduler can’t pack pods efficiently. Without limits, a single pod can consume an entire node. Without both, you can’t autoscale intelligently.
1.1 Determine Actual Usage
# Get CPU/memory usage for all pods in a namespace
kubectl top pods -n production --sort-by=cpu
# Get node-level utilization
kubectl top nodes
# For historical data, use Prometheus queries
# container_cpu_usage_seconds_total
# container_memory_working_set_bytes
# Find pods WITHOUT resource limits (these are the cost leaks)
kubectl get pods -A -o json | jq -r \
'.items[] | select(.spec.containers[].resources.limits == null) |
[.metadata.namespace, .metadata.name] | @tsv'
1.2 Apply Resource Specs
# Deployment with proper resource management
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 3
template:
spec:
containers:
- name: api
image: api:v2.1
resources:
requests: # Used for scheduling — scheduler places pod on node with this much available
cpu: "250m" # 0.25 CPU cores
memory: "512Mi" # 512 MB
limits: # Hard ceiling — pod gets OOMKilled or CPU-throttled if exceeded
cpu: "1000m" # 1 CPU core
memory: "1Gi" # 1 GB
:::tip[Sizing Strategy] Set requests to the P50 (median) usage and limits to the P99 (peak) usage. This ensures efficient packing while preventing OOMKilled situations. Review and adjust monthly — workload patterns change. :::
1.3 Request vs Limit Ratio Guidelines
| Workload Type | CPU Request:Limit Ratio | Memory Request:Limit Ratio | Why |
|---|---|---|---|
| API server (steady load) | 1:2 | 1:1.5 | Predictable, needs guaranteed memory |
| Batch job (bursty) | 1:4 | 1:2 | Bursts briefly, then idles |
| Worker (queue processor) | 1:2 | 1:1.5 | Scales horizontally, steady per-pod |
| Database (stateful) | 1:1 | 1:1 | Needs guaranteed resources always |
| ML inference | 1:1 (GPU) | 1:1 | GPU must be fully dedicated |
Step 2: Implement Namespace Resource Quotas
Prevent any single team from consuming the entire cluster. Without quotas, one runaway deployment can starve all others.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "16"
requests.memory: "32Gi"
limits.cpu: "32"
limits.memory: "64Gi"
pods: "50"
persistentvolumeclaims: "20"
services.loadbalancers: "2" # Load balancers cost $$$
# LimitRange — sets defaults for pods that don't specify resources
# This is your safety net: no pod deploys without limits
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-alpha
spec:
limits:
- default: # Default limits (what pods get if they don't specify)
cpu: "500m"
memory: "512Mi"
defaultRequest: # Default requests
cpu: "100m"
memory: "128Mi"
max: # Maximum any single container can request
cpu: "4"
memory: "8Gi"
min: # Minimum (prevents tiny pods that waste scheduling overhead)
cpu: "50m"
memory: "64Mi"
type: Container
Quota Planning by Team Size
| Team Size | CPU Requests | Memory Requests | Pods | Monthly Cost (approx) |
|---|---|---|---|---|
| Small (2-4 devs, 5 services) | 8 CPU | 16Gi | 30 | ~$600 |
| Medium (5-10 devs, 15 services) | 16 CPU | 32Gi | 50 | ~$1,200 |
| Large (10-20 devs, 30 services) | 32 CPU | 64Gi | 100 | ~$2,400 |
Step 3: Enable Cluster Autoscaler
The Cluster Autoscaler automatically adds nodes when pods are pending (unschedulable) and removes nodes when utilization drops below threshold.
3.1 Azure AKS
az aks update \
--resource-group myRG \
--name myCluster \
--enable-cluster-autoscaler \
--min-count 2 \
--max-count 20
3.2 AWS EKS
# Cluster Autoscaler deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
template:
spec:
containers:
- name: cluster-autoscaler
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --nodes=2:20:eks-nodegroup
- --scale-down-delay-after-add=5m # Wait 5m after adding node before removing any
- --scale-down-unneeded-time=5m # Node must be idle 5m before removal
- --scale-down-utilization-threshold=0.5 # Remove nodes below 50% utilization
- --skip-nodes-with-local-storage=false
- --balance-similar-node-groups=true # Keep node groups evenly sized
3.3 Horizontal Pod Autoscaler (HPA)
HPA scales pods within a deployment; Cluster Autoscaler scales nodes to accommodate pods.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 2
maxReplicas: 10
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down (prevents flapping)
policies:
- type: Percent
value: 25
periodSeconds: 60 # Remove max 25% of pods per minute
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 60 # Can double pods per minute when needed
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 65
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
Karpenter (Alternative to Cluster Autoscaler)
For AWS EKS, Karpenter is often faster and more cost-effective than Cluster Autoscaler:
| Feature | Cluster Autoscaler | Karpenter |
|---|---|---|
| Scale-up speed | 2-5 minutes | 30-90 seconds |
| Instance selection | Fixed node group types | Dynamic — picks cheapest available |
| Spot integration | Basic | Advanced (diversified, fallback) |
| Bin packing | Good | Better (consolidation) |
| Cloud support | AWS, GCP, Azure | AWS only |
Step 4: Use Spot/Preemptible Node Pools
Spot nodes provide 60-90% savings for fault-tolerant workloads. The trade-off: cloud providers can reclaim spot nodes with 2 minutes notice.
4.1 Create a Spot Node Pool
# AKS Spot Pool
az aks nodepool add \
--resource-group myRG \
--cluster-name myCluster \
--name spotpool \
--priority Spot \
--eviction-policy Delete \
--spot-max-price -1 \
--min-count 0 \
--max-count 15 \
--node-vm-size Standard_D4s_v5
# EKS Spot Instances via managed node group (diversify instance types!)
eksctl create nodegroup \
--cluster myCluster \
--name spot-workers \
--instance-types m5.xlarge,m5a.xlarge,m5d.xlarge,m5n.xlarge \
--spot \
--min-size 0 \
--max-size 15
4.2 Schedule Tolerant Workloads on Spot
spec:
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "kubernetes.azure.com/scalesetpriority"
operator: In
values: ["spot"]
4.3 What to Run on Spot vs On-Demand
| Run on Spot ✅ | Keep on On-Demand ❌ |
|---|---|
| CI/CD build runners | Databases (stateful) |
| Batch processing jobs | Authentication services |
| Queue workers (retry on eviction) | Payment processing |
| Dev/staging environments | Single-replica critical services |
| ML training jobs | Kafka/Zookeeper (stateful) |
| Web frontends (multi-replica, LB) | Services without PodDisruptionBudgets |
Step 5: Implement Pod Disruption Budgets
Protect critical services during node scale-down, spot evictions, and maintenance.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2 # Always keep at least 2 pods running
selector:
matchLabels:
app: api-service
Step 6: Schedule Non-Production Shutdown
Dev/staging clusters don’t need to run 24/7. Shutting them down nights and weekends saves 65% of their cost.
#!/bin/bash
# CronJob: scale down dev at 8 PM, scale up at 7 AM
# Scale down (run at 8 PM)
kubectl scale deployment --all --replicas=0 -n dev
kubectl scale deployment --all --replicas=0 -n staging
# Scale up (separate cron job at 7 AM)
kubectl scale deployment --all --replicas=1 -n dev
kubectl scale deployment --all --replicas=1 -n staging
Savings Calculation
Dev cluster: $1,200/month running 24/7
Active hours: 12h/day × 5 days/week = 60 hours (out of 168)
Idle hours: 108 hours/week = 64% of the time
Savings: $1,200 × 0.64 = $768/month saved
Annual savings: $9,216 — PER ENVIRONMENT
Cost Monitoring Tools
| Tool | Free Tier? | Best For |
|---|---|---|
| Kubecost | Yes (basic) | Real-time cost allocation per namespace/label |
| OpenCost | Yes (open source) | CNCF standard cost monitoring |
| AWS Cost Explorer | Yes | EKS cluster costs (node-level) |
| Datadog | Trial | Kubernetes + cost correlation |
| CAST AI | Freemium | Automated rightsizing recommendations |
Cost Optimization Checklist
- Resource requests/limits set on every container (no exceptions)
-
kubectl get podsaudit for pods without resource specs - Namespace ResourceQuotas configured for each team
- LimitRanges set with defaults, min, and max for all namespaces
- Cluster Autoscaler enabled with appropriate min/max bounds
- HPA configured for all variable-traffic deployments
- HPA scale-down stabilization configured (prevent flapping)
- Spot node pools created for fault-tolerant workloads
- PodDisruptionBudgets on all critical services
- Dev/staging environments scheduled shutdown (nights + weekends)
- Cost monitoring tool installed (Kubecost or OpenCost)
- Monthly review of
kubectl topdata and rightsizing report - Load balancer count minimized (shared ingress controllers)
- PersistentVolumes reviewed (unused volumes still cost money)
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For enterprise Kubernetes cost audits, visit garnetgrid.com. :::