Skip to main content
K8sCalc
kubernetes28 April 2026

Kubernetes Cost Optimization: 12 Ways to Cut Your Cloud Bill

Twelve concrete techniques to reduce Kubernetes infrastructure costs — from right-sizing nodes and enabling HPA, to switching storage backends and choosing the right cloud provider.

Kubernetes clusters are expensive to run wrong. Default configurations, over-provisioned nodes, and cloud-managed storage costs compound fast. This guide gives you 12 concrete levers to pull — each with real numbers so you can estimate the impact before you commit engineering time.

Use the Kubernetes Cluster Cost Calculator to model the total cost of your cluster as you apply these techniques.


1. Right-Size Your Nodes Before You Scale

The most common mistake is choosing large instance types "to be safe." A single m5.4xlarge (16 vCPU / 64 GB) on EKS costs ~$550/month. Three m5.xlarge (4 vCPU / 16 GB) nodes cost ~$415/month and give you better scheduling granularity and fault tolerance.

Run this to see actual resource utilization on your current nodes:

kubectl top nodes
kubectl describe nodes | grep -A 5 "Allocated resources"

Target 65–75% CPU utilization and 70–80% memory utilization across your node pool. Lower than that and you're paying for headroom you don't need.

The Kubernetes Node Sizing Calculator helps you model the right instance count and size given your actual workload profile.


2. Enable Horizontal Pod Autoscaler on Every Stateless Workload

HPA is the easiest win. If your deployments run at a fixed replica count 24/7 and traffic varies, you're burning money during off-peak hours.

A minimal HPA config:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70

Use the Kubernetes HPA Generator to build HPA manifests tuned for your workload without writing boilerplate from scratch.

For scale-to-zero on batch jobs, look at KEDA (Kubernetes Event-Driven Autoscaling) — it can drive replicas to zero based on queue depth, cron schedules, or external metrics.


3. Use the Cluster Autoscaler (Not Just HPA)

HPA scales pods. The Cluster Autoscaler scales nodes. Both are needed.

Without the Cluster Autoscaler, you pay for nodes even when pods are pending — or you pre-provision capacity that sits idle. With it, nodes are added when pods can't be scheduled and removed when utilization drops.

Key config that most people miss:

--scale-down-utilization-threshold=0.5
--scale-down-delay-after-add=10m
--scale-down-unneeded-time=10m
--skip-nodes-with-system-pods=false

Setting skip-nodes-with-system-pods=false allows the autoscaler to drain nodes running DaemonSet pods (like Prometheus node-exporter), which otherwise block scale-down.


4. Use Spot/Preemptible Instances for Stateless Workloads

Spot instances deliver 60–90% discount vs on-demand. The trade-off is eviction with ~2 minutes notice.

ProviderOn-Demand (4vCPU/16GB)Spot PriceSavings
AWS (m5.xlarge)~$138/mo~$28–55/mo60–80%
GCP (n2-standard-4)~$155/mo~$30–50/mo68–80%
Azure (D4s v5)~$145/mo~$25–50/mo65–82%
Hetzner (CCX23)~$28/moN/A

For stateless workloads (web servers, API handlers, workers), design for eviction: use PodDisruptionBudgets, set terminationGracePeriodSeconds appropriately, and use node affinity to keep critical replicas on on-demand nodes.

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
            - key: node.kubernetes.io/lifecycle
              operator: In
              values:
                - spot

5. Set Resource Requests and Limits on Every Pod

Without resource requests, the scheduler places pods arbitrarily. Without limits, a noisy neighbor pod can consume an entire node's memory.

More importantly, pods without requests don't contribute to node utilization metrics, which breaks Cluster Autoscaler decisions.

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "1000m"
    memory: "512Mi"

Use LimitRange objects per namespace to enforce defaults:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: staging
spec:
  limits:
    - default:
        cpu: "500m"
        memory: "256Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      type: Container

6. Enforce Namespace Resource Quotas

Without quotas, a single team can accidentally consume the entire cluster's capacity. ResourceQuota objects put hard ceilings on what a namespace can request.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-payments
spec:
  hard:
    requests.cpu: "8"
    requests.memory: "16Gi"
    limits.cpu: "16"
    limits.memory: "32Gi"
    count/pods: "50"
    count/services: "10"
    count/persistentvolumeclaims: "20"

This protects other tenants and gives you per-team cost attribution when combined with Kubernetes labels and a cost allocation tool.


7. Replace Cloud-Managed Storage with Longhorn

Cloud-managed persistent volumes are expensive. AWS EBS gp3 costs ~$0.08/GB/month. GCP Persistent Disk is ~$0.10/GB/month. If you're running a data-heavy workload, storage becomes a significant line item fast.

Storage cost comparison (1 TB):

OptionCost/month
AWS EBS gp3 (1 TB)~$80
GCP Persistent Disk (1 TB)~$100
Longhorn on Hetzner (3-replica)~$14 (included in node cost)
Rook/Ceph on bare metal~$8–15 (node cost only)

Longhorn runs in-cluster, replicates across nodes, and handles snapshots and backups to S3. Use the Longhorn Storage Calculator to size the raw disk capacity you need given your replication factor and workload.

See the full breakdown: Longhorn vs Rook/Ceph and Longhorn vs OpenEBS.


8. Switch to a Cheaper Cloud Provider for Non-Critical Clusters

Not every environment needs AWS or GCP. Dev, staging, and CI clusters are natural candidates for cheaper providers.

Provider3-worker cluster (4vCPU/8GB each)Managed K8s feeTotal/mo
AWS EKS~$207 (EC2)$73~$280
GKE~$185 (GCE)$73~$258
DigitalOcean DOKS~$144$0~$144
Hetzner (kubeadm)~$35$0~$35
Vultr~$90$0~$90

Use the Hetzner Kubernetes Cost Calculator and Vultr Kubernetes Cost Calculator to compare real pricing before you migrate.


9. Use k3s for Edge, Dev, and Small Production Clusters

k3s has a ~40 MB binary, runs on 512 MB RAM, and removes many heavyweight components (cloud controller manager, in-tree storage drivers). For small production workloads, edge deployments, or CI environments, it cuts costs dramatically.

Use the k3s Resource Calculator to estimate node requirements for a k3s cluster, and the Hetzner k3s Cost Calculator for full cost modeling.


10. Delete Idle Namespaces and Stale Resources

kubectl get pods -A | grep -v Running | grep -v Completed — if you see dozens of CrashLoopBackOff or Pending pods eating up resource requests without doing useful work, clean them up.

Run this weekly in CI:

# Find namespaces with no running pods
kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' | \
  tr ' ' '\n' | while read ns; do
    count=$(kubectl get pods -n $ns --field-selector=status.phase=Running -o name 2>/dev/null | wc -l)
    if [ "$count" -eq 0 ]; then echo "$ns: 0 running pods"; fi
  done

11. Tune Your Observability Stack

Prometheus and Loki are notorious for consuming unexpected amounts of CPU and storage. A misconfigured Prometheus scrape interval or a Loki ingestion pipeline without rate limits can quietly add $50–200/month in node costs.

  • Set scrape_interval: 30s (default is 15s — halving scrape frequency cuts CPU and storage roughly in half)
  • Use Prometheus recording rules to pre-aggregate high-cardinality metrics
  • Configure Loki with per-tenant ingestion rate limits
  • Use the Prometheus Storage Calculator and Loki Log Storage Calculator to size your retention correctly

12. Use VPA (Vertical Pod Autoscaler) in Recommendation Mode

The Vertical Pod Autoscaler in recommendation mode (updateMode: Off) doesn't change anything — it just tells you what resource requests your pods should have based on actual usage.

kubectl get vpa -A -o yaml | grep -A 10 recommendation

This is the fastest way to find over-provisioned pods without any risk. After a week of data collection, VPA recommendations often reveal pods with 4x–10x over-provisioned memory requests that are blocking the Cluster Autoscaler from scaling down nodes.


Putting It Together

Apply these in order of effort vs impact:

TipEffortTypical Savings
HPA on stateless workloadsLow20–40% compute
Resource requests on all podsLowEnables autoscaler
VPA recommendationsLow10–30% right-sizing
Spot instancesMedium60–80% node cost
Namespace quotasMediumPrevents overrun
Switch provider (dev/staging)Medium50–75%
Replace cloud storage w/ LonghornMedium60–90% storage
Cluster Autoscaler tuningMedium15–30% idle node cost
k3s for small clustersHigh30–60% total

Start with HPA and resource requests — they cost nothing and enable everything else. Then use the Kubernetes Cluster Cost Calculator to track your progress.