Kubernetes Cost Control
Reduce Kubernetes infrastructure costs without sacrificing reliability. Covers resource requests and limits, autoscaling, spot/preemptible nodes, namespace quotas, cost attribution, and the FinOps patterns that keep Kubernetes spending predictable.
Kubernetes makes it easy to overspend. Auto-scaling creates nodes you forget about, resource requests with comfortable margins waste 60% of capacity, and nobody knows which team is responsible for which cost. Kubernetes cost control requires understanding resources, setting proper limits, and attributing costs to teams.
Resource Requests vs Limits
# Resource requests: What the pod NEEDS (scheduler uses this)
# Resource limits: What the pod can MAXIMUM use (enforced)
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: api
resources:
requests:
cpu: "250m" # Guaranteed 0.25 CPU cores
memory: "256Mi" # Guaranteed 256MB RAM
limits:
cpu: "500m" # Cannot exceed 0.5 CPU cores
memory: "512Mi" # OOM-killed if exceeds 512MB
# Common mistake: Requests too high
# Result: Cluster is "full" but CPUs are idle
# Fix: Set requests based on actual P95 usage, not theoretical max
# How to find actual usage:
# kubectl top pods --containers
# Prometheus: container_cpu_usage_seconds_total
# Right-size: set request to P95 usage + 20% buffer
Autoscaling
# HPA: Scale pods based on metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up when CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Percent
value: 25 # Scale down max 25% at a time
periodSeconds: 60
# Cluster Autoscaler: Scale nodes
# Adds nodes when pods can't be scheduled
# Removes nodes when utilization is low (< 50%)
Spot/Preemptible Nodes
# Node pool with spot instances (60-90% cheaper)
nodePool:
name: spot-pool
machineType: n2-standard-4
spot: true
autoscaling:
minNodes: 0
maxNodes: 20
# Tolerations for spot-tolerant workloads
spec:
tolerations:
- key: "cloud.google.com/gke-spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: "cloud.google.com/gke-spot"
operator: "In"
values: ["true"]
# Safe for spot:
# ✓ Stateless API servers (can restart)
# ✓ Batch processing (can retry)
# ✓ CI/CD runners (ephemeral)
#
# NOT safe for spot:
# ✗ Databases (data loss risk)
# ✗ Stateful services (disruption)
# ✗ Single-replica critical services
Cost Attribution
# Label everything for cost tracking
metadata:
labels:
team: "payments"
environment: "production"
service: "payment-api"
cost-center: "CC-1234"
# Namespace quotas per team
apiVersion: v1
kind: ResourceQuota
metadata:
name: payments-team-quota
namespace: payments
spec:
hard:
requests.cpu: "10" # Team gets 10 CPU cores
requests.memory: "20Gi" # Team gets 20GB RAM
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50" # Max 50 pods
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| No resource requests | Overloaded nodes, OOM kills | Set requests based on actual usage |
| Requests = limits (too high) | Wasted capacity, over-provisioned | Requests at P95 usage, limits higher |
| No HPA configured | Manual scaling, over-provisioned | HPA on CPU/memory/custom metrics |
| All on-demand nodes | 2-3x more expensive than needed | Spot nodes for stateless workloads |
| No cost labels | Cannot attribute cost to teams | Mandatory labels + quota enforcement |
You cannot optimize what you cannot measure. Label everything, set requests based on actual usage, leverage spot instances for tolerant workloads, and enforce team quotas.