observability
Kubernetes Observability Stack Sizing Calculator
Size your Prometheus, Grafana, Loki, and Alertmanager stack. Calculate RAM, disk, and CPU requirements based on your cluster size and retention period.
Sizing the Kubernetes Observability Stack
The standard Kubernetes observability stack (kube-prometheus-stack) consists of Prometheus, Grafana, Loki, and Alertmanager. Each component has different resource profiles.
Prometheus Memory Model
Prometheus keeps all active time series in RAM:
RAM ≈ series × 2.5 KB + WAL + query cacheSeries count depends on your workloads. Key contributors:
- ›Node exporters: ~200–300 series per node
- ›kube-state-metrics: ~500–2,000 series per cluster
- ›Application metrics: 10–100+ series per pod
Disk Sizing Formula
disk_per_day_GB = (series_count / scrape_interval_sec) × 2 × 86400 / 1e9Grafana
Grafana is stateless and lightweight — 256–512 MB RAM. Store dashboards in a PVC or as ConfigMaps in your Git repo (Grafonnet/Jsonnet).
Loki Architecture
Loki's storage depends on log volume, not retention period indexing. Each pod produces ~50–200 MB of logs per day. Loki compresses ~5–10×, so 100 pods × 100 MB/day × 0.15 (compression) = ~1.5 GB/day. With S3 backend, Loki's local disk usage is minimal.
Scaling Tips
- ›At >100 nodes: run Prometheus with remote_write to Thanos or Victoria Metrics
- ›At >500 pods: increase memory.chunkEncoding to snappy in Prometheus
- ›Use Grafana Tempo for distributed tracing — minimal overhead, huge debugging value
Frequently Asked Questions
Why does Prometheus use so much RAM?
Prometheus stores all active time series in an in-memory index. Each series uses ~2–3 KB of RAM. A cluster with 50 pods and 500 series per pod = 25,000 series × 2.5 KB = ~62 MB just for the index. Add WAL, query cache, and chunk cache and a mid-size cluster needs 1–4 GB RAM.
How much disk does Prometheus need for 90 days retention?
At 15s scrape interval: ~2 bytes/sample. For 1,000 series at 15s for 90 days: 1,000 × (86400/15) × 2 × 90 = ~1 GB. Scale linearly with series count. A 10,000-series cluster needs ~10 GB for 90 days at default scrape intervals.
Is Loki cheaper than Elasticsearch for log storage?
Yes — significantly. Loki stores unindexed log chunks and only indexes labels (not full-text), reducing storage by 5–10×. A cluster producing 500 MB/day of logs: Elasticsearch needs ~5–10 GB/day indexed; Loki needs ~500 MB–1 GB/day compressed.
Should I use the kube-prometheus-stack Helm chart?
Yes, for most clusters. kube-prometheus-stack bundles Prometheus, Grafana, Alertmanager, and node-exporter with pre-configured dashboards and alerts. It's the de-facto standard. For very large clusters (500+ nodes), consider Victoria Metrics as a Prometheus replacement — it's more memory efficient.