Verified by Garnet Grid

How to Control Kubernetes Costs: Resource Limits, Autoscaling, and Spot Nodes

Reduce Kubernetes infrastructure costs by 40-65% with proper resource requests, cluster autoscaling, spot node pools, and namespace quotas.

Kubernetes makes it trivially easy to waste money. Every pod without resource limits is an open checkbook. Every idle node is a bill you’re paying for nothing. The average Kubernetes cluster runs at 20-35% utilization — meaning 65-80% of your compute spend is pure waste. This guide shows you exactly how to find that waste and eliminate it, typically saving 40-65% of cluster costs within the first month.

The cost problem isn’t Kubernetes itself — it’s the gap between what you provision and what your workloads actually use. Close that gap and you’ll save thousands per month without degrading performance.


Step 1: Set Resource Requests and Limits on Every Pod

The single most important cost control mechanism in Kubernetes. Without resource requests, the scheduler can’t pack pods efficiently. Without limits, a single pod can consume an entire node. Without both, you can’t autoscale intelligently.

1.1 Determine Actual Usage

# Get CPU/memory usage for all pods in a namespace
kubectl top pods -n production --sort-by=cpu

# Get node-level utilization
kubectl top nodes

# For historical data, use Prometheus queries
# container_cpu_usage_seconds_total
# container_memory_working_set_bytes

# Find pods WITHOUT resource limits (these are the cost leaks)
kubectl get pods -A -o json | jq -r \
  '.items[] | select(.spec.containers[].resources.limits == null) |
   [.metadata.namespace, .metadata.name] | @tsv'

1.2 Apply Resource Specs

# Deployment with proper resource management
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: api
          image: api:v2.1
          resources:
            requests:           # Used for scheduling — scheduler places pod on node with this much available
              cpu: "250m"       # 0.25 CPU cores
              memory: "512Mi"   # 512 MB
            limits:             # Hard ceiling — pod gets OOMKilled or CPU-throttled if exceeded
              cpu: "1000m"      # 1 CPU core
              memory: "1Gi"     # 1 GB

:::tip[Sizing Strategy] Set requests to the P50 (median) usage and limits to the P99 (peak) usage. This ensures efficient packing while preventing OOMKilled situations. Review and adjust monthly — workload patterns change. :::

1.3 Request vs Limit Ratio Guidelines

Workload TypeCPU Request:Limit RatioMemory Request:Limit RatioWhy
API server (steady load)1:21:1.5Predictable, needs guaranteed memory
Batch job (bursty)1:41:2Bursts briefly, then idles
Worker (queue processor)1:21:1.5Scales horizontally, steady per-pod
Database (stateful)1:11:1Needs guaranteed resources always
ML inference1:1 (GPU)1:1GPU must be fully dedicated

Step 2: Implement Namespace Resource Quotas

Prevent any single team from consuming the entire cluster. Without quotas, one runaway deployment can starve all others.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "16"
    requests.memory: "32Gi"
    limits.cpu: "32"
    limits.memory: "64Gi"
    pods: "50"
    persistentvolumeclaims: "20"
    services.loadbalancers: "2"   # Load balancers cost $$$
# LimitRange — sets defaults for pods that don't specify resources
# This is your safety net: no pod deploys without limits
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-alpha
spec:
  limits:
    - default:          # Default limits (what pods get if they don't specify)
        cpu: "500m"
        memory: "512Mi"
      defaultRequest:   # Default requests
        cpu: "100m"
        memory: "128Mi"
      max:              # Maximum any single container can request
        cpu: "4"
        memory: "8Gi"
      min:              # Minimum (prevents tiny pods that waste scheduling overhead)
        cpu: "50m"
        memory: "64Mi"
      type: Container

Quota Planning by Team Size

Team SizeCPU RequestsMemory RequestsPodsMonthly Cost (approx)
Small (2-4 devs, 5 services)8 CPU16Gi30~$600
Medium (5-10 devs, 15 services)16 CPU32Gi50~$1,200
Large (10-20 devs, 30 services)32 CPU64Gi100~$2,400

Step 3: Enable Cluster Autoscaler

The Cluster Autoscaler automatically adds nodes when pods are pending (unschedulable) and removes nodes when utilization drops below threshold.

3.1 Azure AKS

az aks update \
  --resource-group myRG \
  --name myCluster \
  --enable-cluster-autoscaler \
  --min-count 2 \
  --max-count 20

3.2 AWS EKS

# Cluster Autoscaler deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: cluster-autoscaler
          image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
          command:
            - ./cluster-autoscaler
            - --cloud-provider=aws
            - --nodes=2:20:eks-nodegroup
            - --scale-down-delay-after-add=5m     # Wait 5m after adding node before removing any
            - --scale-down-unneeded-time=5m        # Node must be idle 5m before removal
            - --scale-down-utilization-threshold=0.5  # Remove nodes below 50% utilization
            - --skip-nodes-with-local-storage=false
            - --balance-similar-node-groups=true   # Keep node groups evenly sized

3.3 Horizontal Pod Autoscaler (HPA)

HPA scales pods within a deployment; Cluster Autoscaler scales nodes to accommodate pods.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300   # Wait 5 min before scaling down (prevents flapping)
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60             # Remove max 25% of pods per minute
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60             # Can double pods per minute when needed
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75

Karpenter (Alternative to Cluster Autoscaler)

For AWS EKS, Karpenter is often faster and more cost-effective than Cluster Autoscaler:

FeatureCluster AutoscalerKarpenter
Scale-up speed2-5 minutes30-90 seconds
Instance selectionFixed node group typesDynamic — picks cheapest available
Spot integrationBasicAdvanced (diversified, fallback)
Bin packingGoodBetter (consolidation)
Cloud supportAWS, GCP, AzureAWS only

Step 4: Use Spot/Preemptible Node Pools

Spot nodes provide 60-90% savings for fault-tolerant workloads. The trade-off: cloud providers can reclaim spot nodes with 2 minutes notice.

4.1 Create a Spot Node Pool

# AKS Spot Pool
az aks nodepool add \
  --resource-group myRG \
  --cluster-name myCluster \
  --name spotpool \
  --priority Spot \
  --eviction-policy Delete \
  --spot-max-price -1 \
  --min-count 0 \
  --max-count 15 \
  --node-vm-size Standard_D4s_v5

# EKS Spot Instances via managed node group (diversify instance types!)
eksctl create nodegroup \
  --cluster myCluster \
  --name spot-workers \
  --instance-types m5.xlarge,m5a.xlarge,m5d.xlarge,m5n.xlarge \
  --spot \
  --min-size 0 \
  --max-size 15

4.2 Schedule Tolerant Workloads on Spot

spec:
  tolerations:
    - key: "kubernetes.azure.com/scalesetpriority"
      operator: "Equal"
      value: "spot"
      effect: "NoSchedule"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: "kubernetes.azure.com/scalesetpriority"
                operator: In
                values: ["spot"]

4.3 What to Run on Spot vs On-Demand

Run on Spot ✅Keep on On-Demand ❌
CI/CD build runnersDatabases (stateful)
Batch processing jobsAuthentication services
Queue workers (retry on eviction)Payment processing
Dev/staging environmentsSingle-replica critical services
ML training jobsKafka/Zookeeper (stateful)
Web frontends (multi-replica, LB)Services without PodDisruptionBudgets

Step 5: Implement Pod Disruption Budgets

Protect critical services during node scale-down, spot evictions, and maintenance.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2          # Always keep at least 2 pods running
  selector:
    matchLabels:
      app: api-service

Step 6: Schedule Non-Production Shutdown

Dev/staging clusters don’t need to run 24/7. Shutting them down nights and weekends saves 65% of their cost.

#!/bin/bash
# CronJob: scale down dev at 8 PM, scale up at 7 AM

# Scale down (run at 8 PM)
kubectl scale deployment --all --replicas=0 -n dev
kubectl scale deployment --all --replicas=0 -n staging

# Scale up (separate cron job at 7 AM)
kubectl scale deployment --all --replicas=1 -n dev
kubectl scale deployment --all --replicas=1 -n staging

Savings Calculation

Dev cluster: $1,200/month running 24/7
Active hours: 12h/day × 5 days/week = 60 hours (out of 168)
Idle hours: 108 hours/week = 64% of the time
Savings: $1,200 × 0.64 = $768/month saved
Annual savings: $9,216 — PER ENVIRONMENT

Cost Monitoring Tools

ToolFree Tier?Best For
KubecostYes (basic)Real-time cost allocation per namespace/label
OpenCostYes (open source)CNCF standard cost monitoring
AWS Cost ExplorerYesEKS cluster costs (node-level)
DatadogTrialKubernetes + cost correlation
CAST AIFreemiumAutomated rightsizing recommendations

Cost Optimization Checklist

  • Resource requests/limits set on every container (no exceptions)
  • kubectl get pods audit for pods without resource specs
  • Namespace ResourceQuotas configured for each team
  • LimitRanges set with defaults, min, and max for all namespaces
  • Cluster Autoscaler enabled with appropriate min/max bounds
  • HPA configured for all variable-traffic deployments
  • HPA scale-down stabilization configured (prevent flapping)
  • Spot node pools created for fault-tolerant workloads
  • PodDisruptionBudgets on all critical services
  • Dev/staging environments scheduled shutdown (nights + weekends)
  • Cost monitoring tool installed (Kubecost or OpenCost)
  • Monthly review of kubectl top data and rightsizing report
  • Load balancer count minimized (shared ingress controllers)
  • PersistentVolumes reviewed (unused volumes still cost money)

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For enterprise Kubernetes cost audits, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →