K8sCalc

observability

Prometheus Storage Calculator

Calculate Prometheus disk and RAM requirements based on active time series count, scrape interval, and retention period. Includes TSDB head, WAL, and chunk cache sizing.

How Prometheus Storage Works

Prometheus stores metrics in a local time-series database (TSDB) with a two-tier architecture: an in-memory head block for recent data and compressed on-disk blocks for older data.

Memory Model

The head block holds the last 2 hours of data in RAM:

RAM ≈ (active_series × 2.5 KB) + WAL_buffer + chunk_cache

High-cardinality labels are the #1 cause of Prometheus OOM kills. Labels like user_id, request_id, or pod_ip can multiply series count by 10–100×.

Disk Model

After 2 hours, head data is compacted and written to TSDB blocks. TSDB compresses ~1.3 bytes/sample:

disk_per_day = (series / scrape_interval_sec) × 1.3 × 86400 bytes

Retention vs Remote Storage

For retention >15 days, consider remote storage:

  • Thanos: adds object storage (S3) + global query view
  • Victoria Metrics: drop-in replacement, 3× better compression
  • Grafana Mimir: managed option for very large deployments

Reducing Series Count

yaml
# Drop high-cardinality labels at scrape time
metric_relabel_configs:
  - regex: 'pod_ip|request_id'
    action: labeldrop

Frequently Asked Questions

Why does Prometheus use so much RAM?

Prometheus keeps all active time series in an in-memory TSDB head block. Each series uses ~2.5 KB for the index, chunk references, and WAL buffer. 100,000 series = ~250 MB just for the head. Add query cache, chunk cache, and WAL replay buffer and you can easily hit 2–4 GB on a busy cluster.

How do I find my actual series count?

Run: `curl -s http://localhost:9090/metrics | grep prometheus_tsdb_head_series`. Or in Grafana: query `prometheus_tsdb_head_series` as a metric. kube-prometheus-stack exposes this by default.

What is WAL and how much disk does it use?

The Write-Ahead Log (WAL) buffers the last ~2 hours of samples before compacting them into TSDB blocks. WAL size ≈ 2h of ingestion. At 10,000 series / 15s scrape = 667 samples/sec × 2 bytes × 7,200 sec ≈ 10 MB. WAL is on the same PVC as TSDB data.

Should I use Victoria Metrics instead of Prometheus?

For clusters with >500,000 series or retention >90 days, Victoria Metrics is 3–5× more memory-efficient and has better query performance. For standard clusters (10K–200K series), Prometheus with the kube-prometheus-stack is simpler to operate.

Related Tools