K8sCalc

observability

Kubernetes Observability Stack Sizing Calculator

Size your Prometheus, Grafana, Loki, and Alertmanager stack. Calculate RAM, disk, and CPU requirements based on your cluster size and retention period.

Sizing the Kubernetes Observability Stack

The standard Kubernetes observability stack (kube-prometheus-stack) consists of Prometheus, Grafana, Loki, and Alertmanager. Each component has different resource profiles.

Prometheus Memory Model

Prometheus keeps all active time series in RAM:

RAM ≈ series × 2.5 KB + WAL + query cache

Series count depends on your workloads. Key contributors:

  • Node exporters: ~200–300 series per node
  • kube-state-metrics: ~500–2,000 series per cluster
  • Application metrics: 10–100+ series per pod

Disk Sizing Formula

disk_per_day_GB = (series_count / scrape_interval_sec) × 2 × 86400 / 1e9

Grafana

Grafana is stateless and lightweight — 256–512 MB RAM. Store dashboards in a PVC or as ConfigMaps in your Git repo (Grafonnet/Jsonnet).

Loki Architecture

Loki's storage depends on log volume, not retention period indexing. Each pod produces ~50–200 MB of logs per day. Loki compresses ~5–10×, so 100 pods × 100 MB/day × 0.15 (compression) = ~1.5 GB/day. With S3 backend, Loki's local disk usage is minimal.

Scaling Tips

  • At >100 nodes: run Prometheus with remote_write to Thanos or Victoria Metrics
  • At >500 pods: increase memory.chunkEncoding to snappy in Prometheus
  • Use Grafana Tempo for distributed tracing — minimal overhead, huge debugging value

Frequently Asked Questions

Why does Prometheus use so much RAM?

Prometheus stores all active time series in an in-memory index. Each series uses ~2–3 KB of RAM. A cluster with 50 pods and 500 series per pod = 25,000 series × 2.5 KB = ~62 MB just for the index. Add WAL, query cache, and chunk cache and a mid-size cluster needs 1–4 GB RAM.

How much disk does Prometheus need for 90 days retention?

At 15s scrape interval: ~2 bytes/sample. For 1,000 series at 15s for 90 days: 1,000 × (86400/15) × 2 × 90 = ~1 GB. Scale linearly with series count. A 10,000-series cluster needs ~10 GB for 90 days at default scrape intervals.

Is Loki cheaper than Elasticsearch for log storage?

Yes — significantly. Loki stores unindexed log chunks and only indexes labels (not full-text), reducing storage by 5–10×. A cluster producing 500 MB/day of logs: Elasticsearch needs ~5–10 GB/day indexed; Loki needs ~500 MB–1 GB/day compressed.

Should I use the kube-prometheus-stack Helm chart?

Yes, for most clusters. kube-prometheus-stack bundles Prometheus, Grafana, Alertmanager, and node-exporter with pre-configured dashboards and alerts. It's the de-facto standard. For very large clusters (500+ nodes), consider Victoria Metrics as a Prometheus replacement — it's more memory efficient.

Related Tools