ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Kubernetes Cost Control

Reduce Kubernetes infrastructure costs without sacrificing reliability. Covers resource requests and limits, autoscaling, spot/preemptible nodes, namespace quotas, cost attribution, and the FinOps patterns that keep Kubernetes spending predictable.

Kubernetes makes it easy to overspend. Auto-scaling creates nodes you forget about, resource requests with comfortable margins waste 60% of capacity, and nobody knows which team is responsible for which cost. Kubernetes cost control requires understanding resources, setting proper limits, and attributing costs to teams.


Resource Requests vs Limits

# Resource requests: What the pod NEEDS (scheduler uses this)
# Resource limits: What the pod can MAXIMUM use (enforced)

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: api
          resources:
            requests:
              cpu: "250m"      # Guaranteed 0.25 CPU cores
              memory: "256Mi"  # Guaranteed 256MB RAM
            limits:
              cpu: "500m"      # Cannot exceed 0.5 CPU cores
              memory: "512Mi"  # OOM-killed if exceeds 512MB

# Common mistake: Requests too high
# Result: Cluster is "full" but CPUs are idle
# Fix: Set requests based on actual P95 usage, not theoretical max

# How to find actual usage:
# kubectl top pods --containers
# Prometheus: container_cpu_usage_seconds_total
# Right-size: set request to P95 usage + 20% buffer

Autoscaling

# HPA: Scale pods based on metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70  # Scale up when CPU > 70%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 25  # Scale down max 25% at a time
          periodSeconds: 60

# Cluster Autoscaler: Scale nodes
# Adds nodes when pods can't be scheduled
# Removes nodes when utilization is low (< 50%)

Spot/Preemptible Nodes

# Node pool with spot instances (60-90% cheaper)
nodePool:
  name: spot-pool
  machineType: n2-standard-4
  spot: true
  autoscaling:
    minNodes: 0
    maxNodes: 20

# Tolerations for spot-tolerant workloads
spec:
  tolerations:
    - key: "cloud.google.com/gke-spot"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
              - key: "cloud.google.com/gke-spot"
                operator: "In"
                values: ["true"]

# Safe for spot:
# ✓ Stateless API servers (can restart)
# ✓ Batch processing (can retry)
# ✓ CI/CD runners (ephemeral)
# 
# NOT safe for spot:
# ✗ Databases (data loss risk)
# ✗ Stateful services (disruption)
# ✗ Single-replica critical services

Cost Attribution

# Label everything for cost tracking
metadata:
  labels:
    team: "payments"
    environment: "production"
    service: "payment-api"
    cost-center: "CC-1234"

# Namespace quotas per team
apiVersion: v1
kind: ResourceQuota
metadata:
  name: payments-team-quota
  namespace: payments
spec:
  hard:
    requests.cpu: "10"       # Team gets 10 CPU cores
    requests.memory: "20Gi"  # Team gets 20GB RAM
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"               # Max 50 pods

Anti-Patterns

Anti-PatternConsequenceFix
No resource requestsOverloaded nodes, OOM killsSet requests based on actual usage
Requests = limits (too high)Wasted capacity, over-provisionedRequests at P95 usage, limits higher
No HPA configuredManual scaling, over-provisionedHPA on CPU/memory/custom metrics
All on-demand nodes2-3x more expensive than neededSpot nodes for stateless workloads
No cost labelsCannot attribute cost to teamsMandatory labels + quota enforcement

You cannot optimize what you cannot measure. Label everything, set requests based on actual usage, leverage spot instances for tolerant workloads, and enforce team quotas.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →