How to Control Kubernetes Costs: Resource Limits, Autoscaling, and Spot Nodes

Kubernetes makes it trivially easy to waste money. Every pod without resource limits is an open checkbook. Every idle node is a bill you’re paying for nothing. The average Kubernetes cluster runs at 20-35% utilization — meaning 65-80% of your compute spend is pure waste. This guide shows you exactly how to find that waste and eliminate it, typically saving 40-65% of cluster costs within the first month.

The cost problem isn’t Kubernetes itself — it’s the gap between what you provision and what your workloads actually use. Close that gap and you’ll save thousands per month without degrading performance.

Step 1: Set Resource Requests and Limits on Every Pod

The single most important cost control mechanism in Kubernetes. Without resource requests, the scheduler can’t pack pods efficiently. Without limits, a single pod can consume an entire node. Without both, you can’t autoscale intelligently.

1.1 Determine Actual Usage

# Get CPU/memory usage for all pods in a namespace
kubectl top pods -n production --sort-by=cpu

# Get node-level utilization
kubectl top nodes

# For historical data, use Prometheus queries
# container_cpu_usage_seconds_total
# container_memory_working_set_bytes

# Find pods WITHOUT resource limits (these are the cost leaks)
kubectl get pods -A -o json | jq -r \
  '.items[] | select(.spec.containers[].resources.limits == null) |
   [.metadata.namespace, .metadata.name] | @tsv'

1.2 Apply Resource Specs

# Deployment with proper resource management
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: api
          image: api:v2.1
          resources:
            requests:           # Used for scheduling — scheduler places pod on node with this much available
              cpu: "250m"       # 0.25 CPU cores
              memory: "512Mi"   # 512 MB
            limits:             # Hard ceiling — pod gets OOMKilled or CPU-throttled if exceeded
              cpu: "1000m"      # 1 CPU core
              memory: "1Gi"     # 1 GB

:::tip[Sizing Strategy] Set requests to the P50 (median) usage and limits to the P99 (peak) usage. This ensures efficient packing while preventing OOMKilled situations. Review and adjust monthly — workload patterns change. :::

1.3 Request vs Limit Ratio Guidelines

Workload Type	CPU Request:Limit Ratio	Memory Request:Limit Ratio	Why
API server (steady load)	1:2	1:1.5	Predictable, needs guaranteed memory
Batch job (bursty)	1:4	1:2	Bursts briefly, then idles
Worker (queue processor)	1:2	1:1.5	Scales horizontally, steady per-pod
Database (stateful)	1:1	1:1	Needs guaranteed resources always
ML inference	1:1 (GPU)	1:1	GPU must be fully dedicated

Step 2: Implement Namespace Resource Quotas

Prevent any single team from consuming the entire cluster. Without quotas, one runaway deployment can starve all others.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "16"
    requests.memory: "32Gi"
    limits.cpu: "32"
    limits.memory: "64Gi"
    pods: "50"
    persistentvolumeclaims: "20"
    services.loadbalancers: "2"   # Load balancers cost $$$

# LimitRange — sets defaults for pods that don't specify resources
# This is your safety net: no pod deploys without limits
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-alpha
spec:
  limits:
    - default:          # Default limits (what pods get if they don't specify)
        cpu: "500m"
        memory: "512Mi"
      defaultRequest:   # Default requests
        cpu: "100m"
        memory: "128Mi"
      max:              # Maximum any single container can request
        cpu: "4"
        memory: "8Gi"
      min:              # Minimum (prevents tiny pods that waste scheduling overhead)
        cpu: "50m"
        memory: "64Mi"
      type: Container

Quota Planning by Team Size

Team Size	CPU Requests	Memory Requests	Pods	Monthly Cost (approx)
Small (2-4 devs, 5 services)	8 CPU	16Gi	30	~$600
Medium (5-10 devs, 15 services)	16 CPU	32Gi	50	~$1,200
Large (10-20 devs, 30 services)	32 CPU	64Gi	100	~$2,400

Step 3: Enable Cluster Autoscaler

The Cluster Autoscaler automatically adds nodes when pods are pending (unschedulable) and removes nodes when utilization drops below threshold.

3.1 Azure AKS

az aks update \
  --resource-group myRG \
  --name myCluster \
  --enable-cluster-autoscaler \
  --min-count 2 \
  --max-count 20

3.2 AWS EKS

# Cluster Autoscaler deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: cluster-autoscaler
          image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
          command:
            - ./cluster-autoscaler
            - --cloud-provider=aws
            - --nodes=2:20:eks-nodegroup
            - --scale-down-delay-after-add=5m     # Wait 5m after adding node before removing any
            - --scale-down-unneeded-time=5m        # Node must be idle 5m before removal
            - --scale-down-utilization-threshold=0.5  # Remove nodes below 50% utilization
            - --skip-nodes-with-local-storage=false
            - --balance-similar-node-groups=true   # Keep node groups evenly sized

3.3 Horizontal Pod Autoscaler (HPA)

HPA scales pods within a deployment; Cluster Autoscaler scales nodes to accommodate pods.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300   # Wait 5 min before scaling down (prevents flapping)
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60             # Remove max 25% of pods per minute
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60             # Can double pods per minute when needed
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75

Karpenter (Alternative to Cluster Autoscaler)

For AWS EKS, Karpenter is often faster and more cost-effective than Cluster Autoscaler:

Feature	Cluster Autoscaler	Karpenter
Scale-up speed	2-5 minutes	30-90 seconds
Instance selection	Fixed node group types	Dynamic — picks cheapest available
Spot integration	Basic	Advanced (diversified, fallback)
Bin packing	Good	Better (consolidation)
Cloud support	AWS, GCP, Azure	AWS only

Step 4: Use Spot/Preemptible Node Pools

Spot nodes provide 60-90% savings for fault-tolerant workloads. The trade-off: cloud providers can reclaim spot nodes with 2 minutes notice.

4.1 Create a Spot Node Pool

# AKS Spot Pool
az aks nodepool add \
  --resource-group myRG \
  --cluster-name myCluster \
  --name spotpool \
  --priority Spot \
  --eviction-policy Delete \
  --spot-max-price -1 \
  --min-count 0 \
  --max-count 15 \
  --node-vm-size Standard_D4s_v5

# EKS Spot Instances via managed node group (diversify instance types!)
eksctl create nodegroup \
  --cluster myCluster \
  --name spot-workers \
  --instance-types m5.xlarge,m5a.xlarge,m5d.xlarge,m5n.xlarge \
  --spot \
  --min-size 0 \
  --max-size 15

4.2 Schedule Tolerant Workloads on Spot

spec:
  tolerations:
    - key: "kubernetes.azure.com/scalesetpriority"
      operator: "Equal"
      value: "spot"
      effect: "NoSchedule"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: "kubernetes.azure.com/scalesetpriority"
                operator: In
                values: ["spot"]

4.3 What to Run on Spot vs On-Demand

Run on Spot ✅	Keep on On-Demand ❌
CI/CD build runners	Databases (stateful)
Batch processing jobs	Authentication services
Queue workers (retry on eviction)	Payment processing
Dev/staging environments	Single-replica critical services
ML training jobs	Kafka/Zookeeper (stateful)
Web frontends (multi-replica, LB)	Services without PodDisruptionBudgets

Step 5: Implement Pod Disruption Budgets

Protect critical services during node scale-down, spot evictions, and maintenance.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2          # Always keep at least 2 pods running
  selector:
    matchLabels:
      app: api-service

Step 6: Schedule Non-Production Shutdown

Dev/staging clusters don’t need to run 24/7. Shutting them down nights and weekends saves 65% of their cost.

#!/bin/bash
# CronJob: scale down dev at 8 PM, scale up at 7 AM

# Scale down (run at 8 PM)
kubectl scale deployment --all --replicas=0 -n dev
kubectl scale deployment --all --replicas=0 -n staging

# Scale up (separate cron job at 7 AM)
kubectl scale deployment --all --replicas=1 -n dev
kubectl scale deployment --all --replicas=1 -n staging

Savings Calculation

Dev cluster: $1,200/month running 24/7
Active hours: 12h/day × 5 days/week = 60 hours (out of 168)
Idle hours: 108 hours/week = 64% of the time
Savings: $1,200 × 0.64 = $768/month saved
Annual savings: $9,216 — PER ENVIRONMENT

Cost Monitoring Tools

Tool	Free Tier?	Best For
Kubecost	Yes (basic)	Real-time cost allocation per namespace/label
OpenCost	Yes (open source)	CNCF standard cost monitoring
AWS Cost Explorer	Yes	EKS cluster costs (node-level)
Datadog	Trial	Kubernetes + cost correlation
CAST AI	Freemium	Automated rightsizing recommendations

Cost Optimization Checklist

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For enterprise Kubernetes cost audits, visit garnetgrid.com. :::