Kubernetes Cost Optimization

Kubernetes makes it easy to over-provision. Every team requests 4 CPUs and 8 GB RAM “just in case,” but actual usage averages 0.3 CPUs and 500 MB. Across 200 services, that waste adds up to hundreds of thousands of dollars per year. Kubernetes cost optimization is about matching resources to actual usage without causing outages.

Where Kubernetes Money Goes

Cost Component	Typical %	Optimization Lever
Compute (nodes)	60-70%	Right-size pods, use spot/preemptible, autoscale
Storage (PVCs)	10-15%	Right-size volumes, lifecycle policies
Networking (LB, NAT)	10-15%	Reduce cross-AZ traffic, internal LBs
Control plane	5-10%	Managed K8s pricing varies by provider

Resource Right-Sizing

# BEFORE: Over-provisioned (common default)
resources:
  requests:
    cpu: "2000m"     # Requesting 2 full CPUs
    memory: "4Gi"    # Requesting 4 GB
  limits:
    cpu: "4000m"
    memory: "8Gi"

# AFTER: Right-sized based on actual usage data
resources:
  requests:
    cpu: "250m"      # Actual p95 usage: 200m
    memory: "512Mi"  # Actual p95 usage: 400Mi
  limits:
    cpu: "500m"      # 2x headroom for bursts
    memory: "1Gi"    # 2x headroom for spikes

Finding Right Sizes

# Check actual CPU/memory usage vs requests
kubectl top pods -n production --sort-by=cpu

# Use VPA recommendations
kubectl get vpa -n production -o yaml
# Look for: recommendation.containerRecommendations

Spot/Preemptible Instances

Workload Type	Spot Safe?	Strategy
Stateless web services	✅ Yes	Multiple replicas, PodDisruptionBudget
Batch/ML training	✅ Yes	Checkpointing, retry on preemption
Stateful databases	❌ No	On-demand instances only
CI/CD runners	✅ Yes	Re-run jobs on preemption
Cron jobs	✅ Yes	Retry policy

# Node pool with spot instances (GKE)
apiVersion: container.v1.gke.io/v1
kind: NodePool
spec:
  config:
    spot: true
    machineType: e2-standard-4
  autoscaling:
    minNodeCount: 0
    maxNodeCount: 20
  management:
    autoRepair: true

Cluster Autoscaler Settings

# Aggressive scale-down for cost savings
cluster-autoscaler:
  scale-down-enabled: true
  scale-down-delay-after-add: 5m      # Wait 5 min after scaling up
  scale-down-unneeded-time: 5m        # Node unused for 5 min → remove
  scale-down-utilization-threshold: 0.5 # < 50% utilized → candidate
  
  # Prevent thrashing
  max-node-provision-time: 15m
  skip-nodes-with-local-storage: false
  skip-nodes-with-system-pods: true

Anti-Patterns

Anti-Pattern	Problem	Fix
No resource requests	Scheduler can’t bin-pack, nodes underutilized	Set requests on every container
Requests = limits (always)	No burst capacity, need more nodes	Requests at p95 usage, limits at 2x
All on-demand nodes	Paying full price for interruptible workloads	Spot nodes for stateless workloads
No namespace quotas	One team consumes all resources	ResourceQuotas per namespace
No cost visibility	Nobody knows which team spends what	Cost allocation labels on all resources

Checklist

Resource requests set on every pod (based on actual usage)
Limits set at 2x requests for burst headroom
VPA deployed for right-sizing recommendations
Spot/preemptible nodes for stateless workloads (40-60% savings)
Cluster autoscaler configured with aggressive scale-down
Namespace quotas for resource governance
Cost allocation labels: team, service, environment
Cost monitoring dashboard (Kubecost, OpenCost)
Monthly cost review with team owners

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For Kubernetes cost optimization, visit garnetgrid.com. :::

Where Kubernetes Money Goes

Resource Right-Sizing

Finding Right Sizes

Spot/Preemptible Instances

Cluster Autoscaler Settings

Anti-Patterns

Checklist

More in Cloud Engineering

Azure Container Registry Security Scanning

Cloud Governance Frameworks

CDN Architecture & Edge Caching