ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Kubernetes Cost Optimization

Reduce Kubernetes cluster costs without sacrificing reliability. Covers right-sizing pods, cluster autoscaler tuning, multi-tenancy, spot node pools, resource quotas, and the cost visibility tools purpose-built for Kubernetes.

Kubernetes makes it easy to deploy. It also makes it easy to waste money. Default resource requests are either too conservative (wasting capacity) or too generous (not enough headroom). Without active optimization, Kubernetes clusters run at 30-40% utilization while billing for 100%.


Resource Right-Sizing

The Problem

# Typical over-provisioned deployment
resources:
  requests:
    cpu: "1000m"     # Requests 1 full CPU
    memory: "2Gi"    # Requests 2 GB RAM
  limits:
    cpu: "2000m"
    memory: "4Gi"

# Actual usage (from metrics):
# CPU: avg 50m, p99 200m
# Memory: avg 256Mi, peak 512Mi
# Waste: 80% CPU, 75% memory

Right-Sized Configuration

resources:
  requests:
    cpu: "200m"       # 2x P95 usage
    memory: "512Mi"   # 2x average usage
  limits:
    cpu: "500m"       # 2.5x P99 usage
    memory: "1Gi"     # 2x peak usage

Automated Right-Sizing

# Vertical Pod Autoscaler (VPA)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: order-service-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-service
  updatePolicy:
    updateMode: "Auto"      # Automatically applies recommendations
  resourcePolicy:
    containerPolicies:
      - containerName: order-service
        minAllowed:
          cpu: 50m
          memory: 128Mi
        maxAllowed:
          cpu: 2000m
          memory: 4Gi

Cluster Autoscaler Tuning

# Cluster autoscaler configuration
cluster-autoscaler:
  scale-down-delay-after-add: 10m     # Wait after scale-up before scaling down
  scale-down-unneeded-time: 10m       # Node must be unneeded for 10 min
  scale-down-utilization-threshold: 0.5  # Scale down if < 50% utilized
  max-graceful-termination-sec: 600   # Allow 10 min for pod eviction
  skip-nodes-with-system-pods: true   # Don't remove kube-system nodes
  
  # Node group priorities
  expander: priority                  # Prefer cheaper node types
  priority-config: |
    10:
      - spot-pool         # Prefer spot instances
    50:
      - on-demand-pool    # Fall back to on-demand

Spot/Preemptible Node Pools

# Mixed node pool strategy
node-pools:
  - name: system
    instance-type: m5.large
    type: on-demand
    count: 3
    taints: [CriticalAddonsOnly=true:NoSchedule]
    purpose: "Control plane, system workloads"
    
  - name: workload-spot
    instance-types: [m5.xlarge, m5a.xlarge, m5d.xlarge, m6i.xlarge]
    type: spot
    min: 2
    max: 20
    purpose: "Stateless workloads, workers"
    
  - name: workload-ondemand
    instance-type: m5.xlarge
    type: on-demand
    min: 2
    max: 10
    purpose: "Stateful workloads, databases"

Cost Visibility

Namespace-Level Cost Tracking

Namespace: order-team
  CPU Requests: 8 cores ($350/month)
  Memory Requests: 32 Gi ($200/month)
  Storage: 500 Gi ($50/month)
  Network Egress: 100 GB ($9/month)
  Total: $609/month
  
  Utilization: 45% CPU, 60% Memory
  Potential savings: $250/month with right-sizing

Tools

ToolTypeFeatures
KubecostOpen sourceReal-time cost allocation, recommendations
OpenCostCNCF projectKubernetes cost monitoring standard
Spot.ioCommercialAutomated spot management
CAST AICommercialAutomated right-sizing and node optimization

Anti-Patterns

Anti-PatternConsequenceFix
No resource requestsScheduler cannot bin-pack, nodes underutilizedSet requests on every container
Requests == limits for CPUNo burst capacity, over-provisionedCPU limits > requests (or no CPU limits)
One large node poolCannot mix spot and on-demandMultiple pools by workload type
No namespace resource quotasOne team consumes all capacityEnforce quotas per namespace
No cost visibilityTeams do not know their spendDeploy Kubecost / OpenCost

Kubernetes cost optimization is continuous. Set up visibility, right-size resources, use spot instances for stateless workloads, and review regularly.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →