Kubernetes Node Sizing Guide: A Practical Framework for 2026

Node sizing is one of the most consequential decisions in Kubernetes cluster design. Get it wrong and you either waste money on idle capacity, or you hit fragmentation problems where workloads can't schedule because no single node has enough contiguous resources — even though the cluster has plenty in aggregate.

Use the Kubernetes Node Sizing Calculator to model your specific workload mix. This guide covers the factors that feed into that decision.

What's Actually Available on a Node

The RAM and CPU you see advertised for a cloud instance type is not what your pods see. Multiple layers of overhead eat into the schedulable capacity.

Kubernetes System Overhead

kube-reserved and system-reserved carve out resources for the kubelet, container runtime, and OS:

Overhead Source	Typical CPU Reserved	Typical Memory Reserved
kubelet	100m	100 Mi
container runtime (containerd)	100m	100 Mi
OS / system services	50m	100 Mi
Eviction threshold	—	100 Mi
Total (small node)	~250m	~400 Mi

These numbers don't scale linearly. On a 64-core node, the overhead is still roughly the same absolute number — meaning it becomes a negligible percentage. On a 2-core node, 250m is 12.5% of your CPU before a single pod runs.

DaemonSet Tax

Every DaemonSet runs a pod on every node. Add up all your DaemonSets and subtract their resource requests from every node's capacity. A typical cluster in 2026 runs:

DaemonSet	CPU Request	Memory Request
kube-proxy	100m	64 Mi
CNI plugin (Calico/Cilium)	250m	256 Mi
Node monitoring (node-exporter)	100m	50 Mi
Log forwarder (Fluent Bit)	100m	100 Mi
Security agent	100–200m	128–256 Mi
Total	~700m	~800 Mi

On a t3.medium (2 vCPU, 4 GB), you've already consumed 35% of CPU and 52% of memory before scheduling a single application pod. The node is effectively unusable for most production workloads.

Memory Overcommit Risk

CPU is compressible — pods that exceed their CPU limit get throttled, not killed. Memory is not. A pod that exceeds its memory limit gets OOMKilled. This means memory requests matter more than CPU requests for reliability: always set memory requests equal to limits for critical workloads, and never set requests so low that the node runs out of actual RAM.

Few Large vs Many Small Nodes

This is the most common sizing debate. The right answer depends on your workload, but here are the real tradeoffs:

Argument for Fewer, Larger Nodes

›Lower overhead percentage. System overhead and DaemonSet tax are fixed costs. On a 32-core node, 700m of DaemonSet CPU is 2.2%. On a 4-core node, it's 17.5%.
›Better for large workloads. ML inference jobs, JVM services, and Spark executors that need 8+ GB each won't fit on small nodes.
›Simpler operations. Fewer nodes mean fewer kubelets to manage, less networking complexity, and faster cluster upgrades (fewer nodes to drain and reboot).
›Better bin-packing. The scheduler has more headroom to pack pods efficiently on a large node.

Argument for More, Smaller Nodes

›Failure domain isolation. A node failure takes down fewer pods. On 3 × 64-core nodes, one node failing takes 33% of your capacity. On 24 × 8-core nodes, one failure takes 4%.
›Autoscaler efficiency. The cluster autoscaler can add or remove a small node in 1–2 minutes. Provisioning a large node takes longer and wastes more capacity when scaling down.
›Spot/preemptible fit. Smaller instance types have more availability across spot pools. Large GPU nodes are harder to source at spot pricing.
›HPA granularity. Fine-grained autoscaling works better when pod resource requests are a meaningful fraction of the node. If your pod needs 0.1 CPU and your node has 96 cores, you'll scale the node count slowly while the node is mostly idle.

A Practical Rule of Thumb

Pick nodes where your typical pod occupies 5–20% of node capacity. This gives the scheduler enough room to pack pods efficiently while ensuring a single node failure doesn't destroy availability.

Pod Memory Request	Suggested Node Memory
256 Mi	4–8 GB
1 GB	8–16 GB
4 GB	16–32 GB
16 GB	64–128 GB
80 GB (LLM)	192–384 GB or GPU instance

Sizing for Specific Workload Types

Web Application / API Servers

Typical profile: many small pods, bursty CPU, modest memory.

›Ideal node: 8–16 vCPU, 16–32 GB RAM
›Rationale: Allows 20–50 pods per node. CPU bursting is fine — these pods tolerate throttling. Focus on scheduling density.
›Watch out for: Connection table limits at node level if you're running thousands of pods per node with high connection counts.

ML Inference (GPU)

Typical profile: one or a few pods per node, need dedicated GPU, high memory.

›Ideal node: Purpose-built GPU instance (NVIDIA A100, H100, L40S). Match VRAM to model size.
›Rationale: See the VRAM calculator for LLM inference for model-specific sizing. GPU nodes are expensive and specialized — don't mix general workloads onto them.
›Watch out for: GPU sharing via MPS or time-slicing if utilization is low. Wasted GPU capacity is expensive.

Databases (StatefulSets)

Typical profile: high memory, moderate CPU, high I/O, storage-bound.

›Ideal node: Memory-optimized instances (r-family on AWS, N2-highmem on GCP). Dedicated node pool with taints.
›Rationale: Databases benefit from local NVMe when available and should not compete with application pods for memory. Taint the node pool so only database StatefulSets are scheduled there.

# Taint the node pool
kubectl taint nodes <node-name> dedicated=database:NoSchedule

# Tolerate in the StatefulSet
tolerations:
  - key: dedicated
    operator: Equal
    value: database
    effect: NoSchedule
nodeSelector:
  dedicated: database

Batch / Spot-Tolerant Jobs

Typical profile: variable duration, can restart, cost-sensitive.

›Ideal node: Spot/preemptible instances in a dedicated node pool, smaller instance types for better availability.
›Rationale: Use the cluster autoscaler to scale down to zero when no jobs are running. Jobs should set restartPolicy: OnFailure and be idempotent.

Autoscaler Thresholds

Once you've sized your nodes, the cluster autoscaler needs sensible thresholds. The default scale-down threshold (50% utilization) is often too aggressive — it scales down nodes that still have useful work on them.

Use the Kubernetes Autoscaler Threshold Calculator to model scale-up and scale-down triggers for your node size and pod density.

Key parameters to tune:

# In cluster-autoscaler deployment args
- --scale-down-utilization-threshold=0.65   # Scale down when node < 65% utilized
- --scale-down-delay-after-add=10m          # Wait 10m after scale-up before scaling down
- --scale-down-unneeded-time=10m            # Node must be unneeded for 10m before removal
- --max-node-provision-time=15m             # Give up if node doesn't join in 15m

Higher utilization thresholds (0.65–0.75) reduce wasted capacity. Lower thresholds (0.3–0.5) give more headroom for spikes. The right value depends on how bursty your traffic is and how fast new nodes provision on your cloud provider.

Summary Checklist

Before finalizing your node size:

› Calculate usable capacity: subtract system-reserved, DaemonSet requests, and eviction threshold
› Confirm typical pods take 5–20% of node capacity
› Choose node size based on your largest single-pod memory requirement (no pod should need more than 50% of the node)
› Identify stateful/GPU workloads that need dedicated node pools with taints
› Set resource requests and limits on every pod — the scheduler can't make good decisions without them
› Tune autoscaler thresholds after observing real utilization for a week