FinOps for Kubernetes | The Garnet Wiki

Kubernetes is a cost amplifier. If your workloads are over-provisioned on VMs, they are over-provisioned inside containers too — but now you also pay for the Kubernetes control plane, load balancers, and persistent volumes. Without FinOps practices, Kubernetes costs spiral silently because nobody owns the bill.

Where Kubernetes Money Goes

Typical Kubernetes Cost Breakdown:
  Compute (nodes):         65-75%
  Storage (PV/EBS):        10-15%
  Networking (LB, NAT):    5-10%
  Control plane:           2-5%
  Other (monitoring, logs): 5-10%

Hidden costs:
  ☐ Over-provisioned resource requests (CPU/memory reserved but unused)
  ☐ Idle nodes (cluster autoscaler min > actual need)
  ☐ Orphaned PersistentVolumes (pod deleted, volume remains)
  ☐ Cross-AZ network traffic (nodes in different AZs)
  ☐ Unused LoadBalancers ($15/month each on AWS)

Resource Right-Sizing

# BEFORE: Over-provisioned (common default)
resources:
  requests:
    cpu: "1000m"      # Requesting 1 full CPU
    memory: "2Gi"     # Requesting 2 GB RAM
  limits:
    cpu: "2000m"
    memory: "4Gi"

# Actual usage: 50m CPU, 256Mi memory
# Waste: 950m CPU (95%), 1.75Gi memory (87.5%)

# AFTER: Right-sized based on actual usage
resources:
  requests:
    cpu: "100m"       # 2x actual peak (headroom)
    memory: "512Mi"   # 2x actual peak
  limits:
    cpu: "500m"       # Burst capacity
    memory: "1Gi"     # OOM protection

# Tools for right-sizing:
# - VPA (Vertical Pod Autoscaler): Recommends resource values
# - Kubecost: Shows actual vs requested usage
# - Goldilocks: VPA recommendations per deployment

Cluster Autoscaler Optimization

# Cluster Autoscaler: Scale nodes based on pending pods
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-config
data:
  # Scale down faster (default 10 min is too slow)
  scale-down-delay-after-add: "5m"
  scale-down-unneeded-time: "5m"
  
  # Don't scale down if utilization above 50%
  scale-down-utilization-threshold: "0.5"
  
  # Use mixed instance types for cost optimization
  expander: "least-waste"  # Choose node that wastes least resources

# Node groups with mixed instance types
node_groups:
  general:
    instance_types:
      - m5.xlarge     # On-demand: $0.192/hr
      - m5a.xlarge    # AMD variant: $0.172/hr (10% cheaper)
      - m6i.xlarge    # Newer gen: $0.192/hr (better perf/$)
    min_size: 2
    max_size: 20
    
  spot:
    instance_types:
      - m5.xlarge
      - m5.2xlarge
      - m5a.xlarge
      - c5.xlarge     # Mix families for spot availability
    capacity_type: SPOT  # 60-90% discount
    min_size: 0
    max_size: 30

Cost Allocation by Namespace

# Kubecost-style namespace cost attribution
class K8sCostAllocator:
    def allocate_costs(self, period="month"):
        """Calculate cost per namespace based on actual resource usage."""
        namespaces = self.get_all_namespaces()
        node_costs = self.get_node_costs(period)
        
        allocation = {}
        for ns in namespaces:
            pods = self.get_pods(namespace=ns)
            
            cpu_hours = sum(p.cpu_usage_hours for p in pods)
            memory_gb_hours = sum(p.memory_usage_gb_hours for p in pods)
            storage_gb = sum(p.pv_size_gb for p in pods)
            
            allocation[ns] = {
                "compute": (cpu_hours * node_costs.cost_per_cpu_hour +
                           memory_gb_hours * node_costs.cost_per_gb_hour),
                "storage": storage_gb * node_costs.cost_per_gb_month,
                "network": self.get_namespace_egress_cost(ns),
                "total": None,  # Calculated below
                "efficiency": cpu_hours / sum(p.cpu_requested_hours for p in pods),
            }
            allocation[ns]["total"] = sum(v for k, v in allocation[ns].items() 
                                          if k not in ("total", "efficiency"))
        
        return allocation

Anti-Patterns

Anti-Pattern	Consequence	Fix
Default resource requests	10x over-provisioning	VPA recommendations, monthly right-sizing
No HPA (Horizontal Pod Autoscaler)	Fixed pod count wastes resources	HPA on CPU/custom metrics
On-demand only	Pay full price for fault-tolerant workloads	Spot/preemptible for stateless workloads
No namespace cost visibility	Teams cannot see their spend	Kubecost or OpenCost per namespace
Orphaned resources	PVs, LBs, and nodes that serve nothing	Monthly resource audit, automated cleanup

Kubernetes does not have a cost problem — it has a visibility problem. When teams can see what their workloads cost, they optimize naturally. Give them the data.

Where Kubernetes Money Goes

Resource Right-Sizing

Cluster Autoscaler Optimization

Cost Allocation by Namespace

Anti-Patterns

More in Cloud Engineering

Azure Container Registry Security Scanning

Cloud Governance Frameworks

CDN Architecture & Edge Caching