ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

FinOps for Kubernetes

Optimize Kubernetes costs with resource right-sizing, cluster autoscaling, workload placement, and cost visibility. Covers resource requests vs limits, spot instances for K8s, namespace cost allocation, and the patterns that prevent Kubernetes from becoming a money pit.

Kubernetes is a cost amplifier. If your workloads are over-provisioned on VMs, they are over-provisioned inside containers too — but now you also pay for the Kubernetes control plane, load balancers, and persistent volumes. Without FinOps practices, Kubernetes costs spiral silently because nobody owns the bill.


Where Kubernetes Money Goes

Typical Kubernetes Cost Breakdown:
  Compute (nodes):         65-75%
  Storage (PV/EBS):        10-15%
  Networking (LB, NAT):    5-10%
  Control plane:           2-5%
  Other (monitoring, logs): 5-10%

Hidden costs:
  ☐ Over-provisioned resource requests (CPU/memory reserved but unused)
  ☐ Idle nodes (cluster autoscaler min > actual need)
  ☐ Orphaned PersistentVolumes (pod deleted, volume remains)
  ☐ Cross-AZ network traffic (nodes in different AZs)
  ☐ Unused LoadBalancers ($15/month each on AWS)

Resource Right-Sizing

# BEFORE: Over-provisioned (common default)
resources:
  requests:
    cpu: "1000m"      # Requesting 1 full CPU
    memory: "2Gi"     # Requesting 2 GB RAM
  limits:
    cpu: "2000m"
    memory: "4Gi"

# Actual usage: 50m CPU, 256Mi memory
# Waste: 950m CPU (95%), 1.75Gi memory (87.5%)

# AFTER: Right-sized based on actual usage
resources:
  requests:
    cpu: "100m"       # 2x actual peak (headroom)
    memory: "512Mi"   # 2x actual peak
  limits:
    cpu: "500m"       # Burst capacity
    memory: "1Gi"     # OOM protection

# Tools for right-sizing:
# - VPA (Vertical Pod Autoscaler): Recommends resource values
# - Kubecost: Shows actual vs requested usage
# - Goldilocks: VPA recommendations per deployment

Cluster Autoscaler Optimization

# Cluster Autoscaler: Scale nodes based on pending pods
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-config
data:
  # Scale down faster (default 10 min is too slow)
  scale-down-delay-after-add: "5m"
  scale-down-unneeded-time: "5m"
  
  # Don't scale down if utilization above 50%
  scale-down-utilization-threshold: "0.5"
  
  # Use mixed instance types for cost optimization
  expander: "least-waste"  # Choose node that wastes least resources

# Node groups with mixed instance types
node_groups:
  general:
    instance_types:
      - m5.xlarge     # On-demand: $0.192/hr
      - m5a.xlarge    # AMD variant: $0.172/hr (10% cheaper)
      - m6i.xlarge    # Newer gen: $0.192/hr (better perf/$)
    min_size: 2
    max_size: 20
    
  spot:
    instance_types:
      - m5.xlarge
      - m5.2xlarge
      - m5a.xlarge
      - c5.xlarge     # Mix families for spot availability
    capacity_type: SPOT  # 60-90% discount
    min_size: 0
    max_size: 30

Cost Allocation by Namespace

# Kubecost-style namespace cost attribution
class K8sCostAllocator:
    def allocate_costs(self, period="month"):
        """Calculate cost per namespace based on actual resource usage."""
        namespaces = self.get_all_namespaces()
        node_costs = self.get_node_costs(period)
        
        allocation = {}
        for ns in namespaces:
            pods = self.get_pods(namespace=ns)
            
            cpu_hours = sum(p.cpu_usage_hours for p in pods)
            memory_gb_hours = sum(p.memory_usage_gb_hours for p in pods)
            storage_gb = sum(p.pv_size_gb for p in pods)
            
            allocation[ns] = {
                "compute": (cpu_hours * node_costs.cost_per_cpu_hour +
                           memory_gb_hours * node_costs.cost_per_gb_hour),
                "storage": storage_gb * node_costs.cost_per_gb_month,
                "network": self.get_namespace_egress_cost(ns),
                "total": None,  # Calculated below
                "efficiency": cpu_hours / sum(p.cpu_requested_hours for p in pods),
            }
            allocation[ns]["total"] = sum(v for k, v in allocation[ns].items() 
                                          if k not in ("total", "efficiency"))
        
        return allocation

Anti-Patterns

Anti-PatternConsequenceFix
Default resource requests10x over-provisioningVPA recommendations, monthly right-sizing
No HPA (Horizontal Pod Autoscaler)Fixed pod count wastes resourcesHPA on CPU/custom metrics
On-demand onlyPay full price for fault-tolerant workloadsSpot/preemptible for stateless workloads
No namespace cost visibilityTeams cannot see their spendKubecost or OpenCost per namespace
Orphaned resourcesPVs, LBs, and nodes that serve nothingMonthly resource audit, automated cleanup

Kubernetes does not have a cost problem — it has a visibility problem. When teams can see what their workloads cost, they optimize naturally. Give them the data.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →