K8sCalc

kubernetes

Kubernetes Node Sizing Calculator

Calculate the right number and size of Kubernetes worker nodes for your workloads. Supports Hetzner Cloud and Vultr with verified pricing.

Kubernetes Node Sizing Principles

Right-sizing Kubernetes nodes balances cost, reliability, and operational simplicity. Undersizing causes OOM kills and CPU throttling; oversizing wastes money.

Resource Requests vs Limits

Kubernetes uses requests for scheduling (bin packing) and limits for enforcement:

  • Request: guaranteed minimum — the scheduler uses this to find a node with enough space
  • Limit: maximum allowed — exceeding CPU limit throttles, exceeding memory limit OOM-kills

Always set both. A pod with no requests can land on an already-saturated node.

Overcommit Strategy

ResourceSafe OvercommitRisk
CPU2–3×Throttling (recoverable)
RAM1–1.2×OOM kill (disruptive)

System Pod Overhead

Every node runs non-negotiable system pods:

  • kube-proxy — iptables/ipvs rules
  • coredns — DNS resolution (2 replicas cluster-wide)
  • metrics-server — HPA requires this
  • CNI plugin (Flannel/Calico/Cilium) — pod networking

Combined overhead: ~200–500 MB RAM, ~0.1–0.3 vCPU per node.

Cluster Autoscaler

For production, pair static nodes with the Kubernetes Cluster Autoscaler. Set a minimum node count to handle baseline traffic, and let the autoscaler add nodes under load. This is more cost-effective than over-provisioning.

Frequently Asked Questions

How do I find the right CPU and memory requests for my services?

Run `kubectl top pods --all-namespaces` to see actual CPU/memory usage. Use this as a baseline for requests (set requests = average usage, limits = peak usage). Don't guess — incorrect requests lead to either waste or OOM kills.

Is it better to have fewer large nodes or more small nodes?

Fewer large nodes: simpler to manage, less Kubernetes overhead, but larger blast radius on node failure. More small nodes: better fault tolerance, finer-grained bin packing. A common production pattern is 3–5 medium nodes (cx43/cx53), scaling out via the Cluster Autoscaler.

Why does the calculator add 15% system overhead?

Kubernetes system pods (kube-proxy, CoreDNS, metrics-server, CNI plugin) consume roughly 10–20% of node resources. This overhead is non-negotiable — it's always there. The 15% default is a conservative estimate for a typical cluster.

What is a safe CPU overcommit ratio for production?

2× is the standard production default. CPU is compressible — if pods request more than available, they get throttled, not killed. 3× is fine for development clusters or batch workloads. Never overcommit memory beyond 1.2× in production — OOM kills are disruptive.

Related Tools

Related Guides