Kubernetes Cost Control | The Garnet Wiki

Kubernetes makes it easy to overspend. Auto-scaling creates nodes you forget about, resource requests with comfortable margins waste 60% of capacity, and nobody knows which team is responsible for which cost. Kubernetes cost control requires understanding resources, setting proper limits, and attributing costs to teams.

Resource Requests vs Limits

# Resource requests: What the pod NEEDS (scheduler uses this)
# Resource limits: What the pod can MAXIMUM use (enforced)

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: api
          resources:
            requests:
              cpu: "250m"      # Guaranteed 0.25 CPU cores
              memory: "256Mi"  # Guaranteed 256MB RAM
            limits:
              cpu: "500m"      # Cannot exceed 0.5 CPU cores
              memory: "512Mi"  # OOM-killed if exceeds 512MB

# Common mistake: Requests too high
# Result: Cluster is "full" but CPUs are idle
# Fix: Set requests based on actual P95 usage, not theoretical max

# How to find actual usage:
# kubectl top pods --containers
# Prometheus: container_cpu_usage_seconds_total
# Right-size: set request to P95 usage + 20% buffer

Autoscaling

# HPA: Scale pods based on metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70  # Scale up when CPU > 70%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 25  # Scale down max 25% at a time
          periodSeconds: 60

# Cluster Autoscaler: Scale nodes
# Adds nodes when pods can't be scheduled
# Removes nodes when utilization is low (< 50%)

Spot/Preemptible Nodes

# Node pool with spot instances (60-90% cheaper)
nodePool:
  name: spot-pool
  machineType: n2-standard-4
  spot: true
  autoscaling:
    minNodes: 0
    maxNodes: 20

# Tolerations for spot-tolerant workloads
spec:
  tolerations:
    - key: "cloud.google.com/gke-spot"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
              - key: "cloud.google.com/gke-spot"
                operator: "In"
                values: ["true"]

# Safe for spot:
# ✓ Stateless API servers (can restart)
# ✓ Batch processing (can retry)
# ✓ CI/CD runners (ephemeral)
# 
# NOT safe for spot:
# ✗ Databases (data loss risk)
# ✗ Stateful services (disruption)
# ✗ Single-replica critical services

Cost Attribution

# Label everything for cost tracking
metadata:
  labels:
    team: "payments"
    environment: "production"
    service: "payment-api"
    cost-center: "CC-1234"

# Namespace quotas per team
apiVersion: v1
kind: ResourceQuota
metadata:
  name: payments-team-quota
  namespace: payments
spec:
  hard:
    requests.cpu: "10"       # Team gets 10 CPU cores
    requests.memory: "20Gi"  # Team gets 20GB RAM
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"               # Max 50 pods

Anti-Patterns

Anti-Pattern	Consequence	Fix
No resource requests	Overloaded nodes, OOM kills	Set requests based on actual usage
Requests = limits (too high)	Wasted capacity, over-provisioned	Requests at P95 usage, limits higher
No HPA configured	Manual scaling, over-provisioned	HPA on CPU/memory/custom metrics
All on-demand nodes	2-3x more expensive than needed	Spot nodes for stateless workloads
No cost labels	Cannot attribute cost to teams	Mandatory labels + quota enforcement

You cannot optimize what you cannot measure. Label everything, set requests based on actual usage, leverage spot instances for tolerant workloads, and enforce team quotas.

Resource Requests vs Limits

Autoscaling

Spot/Preemptible Nodes

Cost Attribution

Anti-Patterns

More in Cloud Engineering

Azure Container Registry Security Scanning

Cloud Governance Frameworks

CDN Architecture & Edge Caching