Kubernetes clusters are expensive to run wrong. Default configurations, over-provisioned nodes, and cloud-managed storage costs compound fast. This guide gives you 12 concrete levers to pull — each with real numbers so you can estimate the impact before you commit engineering time.
Use the Kubernetes Cluster Cost Calculator to model the total cost of your cluster as you apply these techniques.
1. Right-Size Your Nodes Before You Scale
The most common mistake is choosing large instance types "to be safe." A single m5.4xlarge (16 vCPU / 64 GB) on EKS costs ~$550/month. Three m5.xlarge (4 vCPU / 16 GB) nodes cost ~$415/month and give you better scheduling granularity and fault tolerance.
Run this to see actual resource utilization on your current nodes:
kubectl top nodes
kubectl describe nodes | grep -A 5 "Allocated resources"
Target 65–75% CPU utilization and 70–80% memory utilization across your node pool. Lower than that and you're paying for headroom you don't need.
The Kubernetes Node Sizing Calculator helps you model the right instance count and size given your actual workload profile.
2. Enable Horizontal Pod Autoscaler on Every Stateless Workload
HPA is the easiest win. If your deployments run at a fixed replica count 24/7 and traffic varies, you're burning money during off-peak hours.
A minimal HPA config:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
Use the Kubernetes HPA Generator to build HPA manifests tuned for your workload without writing boilerplate from scratch.
For scale-to-zero on batch jobs, look at KEDA (Kubernetes Event-Driven Autoscaling) — it can drive replicas to zero based on queue depth, cron schedules, or external metrics.
3. Use the Cluster Autoscaler (Not Just HPA)
HPA scales pods. The Cluster Autoscaler scales nodes. Both are needed.
Without the Cluster Autoscaler, you pay for nodes even when pods are pending — or you pre-provision capacity that sits idle. With it, nodes are added when pods can't be scheduled and removed when utilization drops.
Key config that most people miss:
--scale-down-utilization-threshold=0.5
--scale-down-delay-after-add=10m
--scale-down-unneeded-time=10m
--skip-nodes-with-system-pods=false
Setting skip-nodes-with-system-pods=false allows the autoscaler to drain nodes running DaemonSet pods (like Prometheus node-exporter), which otherwise block scale-down.
4. Use Spot/Preemptible Instances for Stateless Workloads
Spot instances deliver 60–90% discount vs on-demand. The trade-off is eviction with ~2 minutes notice.
| Provider | On-Demand (4vCPU/16GB) | Spot Price | Savings |
|---|---|---|---|
| AWS (m5.xlarge) | ~$138/mo | ~$28–55/mo | 60–80% |
| GCP (n2-standard-4) | ~$155/mo | ~$30–50/mo | 68–80% |
| Azure (D4s v5) | ~$145/mo | ~$25–50/mo | 65–82% |
| Hetzner (CCX23) | ~$28/mo | N/A | — |
For stateless workloads (web servers, API handlers, workers), design for eviction: use PodDisruptionBudgets, set terminationGracePeriodSeconds appropriately, and use node affinity to keep critical replicas on on-demand nodes.
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node.kubernetes.io/lifecycle
operator: In
values:
- spot
5. Set Resource Requests and Limits on Every Pod
Without resource requests, the scheduler places pods arbitrarily. Without limits, a noisy neighbor pod can consume an entire node's memory.
More importantly, pods without requests don't contribute to node utilization metrics, which breaks Cluster Autoscaler decisions.
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
Use LimitRange objects per namespace to enforce defaults:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: staging
spec:
limits:
- default:
cpu: "500m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
type: Container
6. Enforce Namespace Resource Quotas
Without quotas, a single team can accidentally consume the entire cluster's capacity. ResourceQuota objects put hard ceilings on what a namespace can request.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-payments
spec:
hard:
requests.cpu: "8"
requests.memory: "16Gi"
limits.cpu: "16"
limits.memory: "32Gi"
count/pods: "50"
count/services: "10"
count/persistentvolumeclaims: "20"
This protects other tenants and gives you per-team cost attribution when combined with Kubernetes labels and a cost allocation tool.
7. Replace Cloud-Managed Storage with Longhorn
Cloud-managed persistent volumes are expensive. AWS EBS gp3 costs ~$0.08/GB/month. GCP Persistent Disk is ~$0.10/GB/month. If you're running a data-heavy workload, storage becomes a significant line item fast.
Storage cost comparison (1 TB):
| Option | Cost/month |
|---|---|
| AWS EBS gp3 (1 TB) | ~$80 |
| GCP Persistent Disk (1 TB) | ~$100 |
| Longhorn on Hetzner (3-replica) | ~$14 (included in node cost) |
| Rook/Ceph on bare metal | ~$8–15 (node cost only) |
Longhorn runs in-cluster, replicates across nodes, and handles snapshots and backups to S3. Use the Longhorn Storage Calculator to size the raw disk capacity you need given your replication factor and workload.
See the full breakdown: Longhorn vs Rook/Ceph and Longhorn vs OpenEBS.
8. Switch to a Cheaper Cloud Provider for Non-Critical Clusters
Not every environment needs AWS or GCP. Dev, staging, and CI clusters are natural candidates for cheaper providers.
| Provider | 3-worker cluster (4vCPU/8GB each) | Managed K8s fee | Total/mo |
|---|---|---|---|
| AWS EKS | ~$207 (EC2) | $73 | ~$280 |
| GKE | ~$185 (GCE) | $73 | ~$258 |
| DigitalOcean DOKS | ~$144 | $0 | ~$144 |
| Hetzner (kubeadm) | ~$35 | $0 | ~$35 |
| Vultr | ~$90 | $0 | ~$90 |
Use the Hetzner Kubernetes Cost Calculator and Vultr Kubernetes Cost Calculator to compare real pricing before you migrate.
9. Use k3s for Edge, Dev, and Small Production Clusters
k3s has a ~40 MB binary, runs on 512 MB RAM, and removes many heavyweight components (cloud controller manager, in-tree storage drivers). For small production workloads, edge deployments, or CI environments, it cuts costs dramatically.
Use the k3s Resource Calculator to estimate node requirements for a k3s cluster, and the Hetzner k3s Cost Calculator for full cost modeling.
10. Delete Idle Namespaces and Stale Resources
kubectl get pods -A | grep -v Running | grep -v Completed — if you see dozens of CrashLoopBackOff or Pending pods eating up resource requests without doing useful work, clean them up.
Run this weekly in CI:
# Find namespaces with no running pods
kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' | \
tr ' ' '\n' | while read ns; do
count=$(kubectl get pods -n $ns --field-selector=status.phase=Running -o name 2>/dev/null | wc -l)
if [ "$count" -eq 0 ]; then echo "$ns: 0 running pods"; fi
done
11. Tune Your Observability Stack
Prometheus and Loki are notorious for consuming unexpected amounts of CPU and storage. A misconfigured Prometheus scrape interval or a Loki ingestion pipeline without rate limits can quietly add $50–200/month in node costs.
- ›Set
scrape_interval: 30s(default is 15s — halving scrape frequency cuts CPU and storage roughly in half) - ›Use Prometheus recording rules to pre-aggregate high-cardinality metrics
- ›Configure Loki with per-tenant ingestion rate limits
- ›Use the Prometheus Storage Calculator and Loki Log Storage Calculator to size your retention correctly
12. Use VPA (Vertical Pod Autoscaler) in Recommendation Mode
The Vertical Pod Autoscaler in recommendation mode (updateMode: Off) doesn't change anything — it just tells you what resource requests your pods should have based on actual usage.
kubectl get vpa -A -o yaml | grep -A 10 recommendation
This is the fastest way to find over-provisioned pods without any risk. After a week of data collection, VPA recommendations often reveal pods with 4x–10x over-provisioned memory requests that are blocking the Cluster Autoscaler from scaling down nodes.
Putting It Together
Apply these in order of effort vs impact:
| Tip | Effort | Typical Savings |
|---|---|---|
| HPA on stateless workloads | Low | 20–40% compute |
| Resource requests on all pods | Low | Enables autoscaler |
| VPA recommendations | Low | 10–30% right-sizing |
| Spot instances | Medium | 60–80% node cost |
| Namespace quotas | Medium | Prevents overrun |
| Switch provider (dev/staging) | Medium | 50–75% |
| Replace cloud storage w/ Longhorn | Medium | 60–90% storage |
| Cluster Autoscaler tuning | Medium | 15–30% idle node cost |
| k3s for small clusters | High | 30–60% total |
Start with HPA and resource requests — they cost nothing and enable everything else. Then use the Kubernetes Cluster Cost Calculator to track your progress.