K8sCalc

observability

Prometheus Alert Rules Generator

Generate Prometheus alerting rules for pod crash-looping, high CPU/memory, node status, PVC usage, and certificate expiry. Ready to apply as a PrometheusRule CRD.

Writing Prometheus Alert Rules

Prometheus alerting rules evaluate PromQL expressions on a schedule. When the expression returns results, the alert is in 'pending'. After the 'for' duration, it fires to Alertmanager.

Rule Structure

yaml
groups:
  - name: my-alerts
    interval: 30s       # evaluation frequency
    rules:
      - alert: AlertName
        expr: |
          promql_expression > threshold
        for: 5m          # must be true for 5m to fire
        labels:
          severity: critical
        annotations:
          summary: "Human-readable summary"
          description: "Detail with {{ $labels.pod }} template"

Common PromQL Patterns

promql
# Container memory % of limit
(container_memory_working_set_bytes / container_spec_memory_limit_bytes) * 100 > 90

# Pod restart rate (restarts in last 15min) rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 5

# Node disk usage % (1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85 ```

PrometheusRule CRD Wrapper

yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack  # must match Prometheus selector
spec:
  groups:
    # paste generated rules here

Frequently Asked Questions

How do I apply these rules to my cluster?

If using kube-prometheus-stack (Helm), create a PrometheusRule resource and it will be auto-discovered. Otherwise, save as a file in your Prometheus rules directory and reload: curl -X POST http://localhost:9090/-/reload. The generated YAML is structured as a rules group, not a PrometheusRule CRD — add apiVersion/kind wrapper if needed.

What is the 'for' field in alert rules?

The 'for' duration means the condition must be true continuously for that long before the alert fires. 'for: 5m' prevents noisy alerts from transient spikes. Set lower values (1m, 2m) for critical alerts you need fast, higher values (15m, 30m) for capacity warnings.

What's the difference between warning and critical severity?

Severity is just a label — Alertmanager uses it to route alerts to different receivers. Typical setup: critical → PagerDuty/on-call, warning → Slack channel. Define your routing rules in Alertmanager config to match on severity label.

How do I test alert rules without firing them?

Use promtool: promtool test rules test.yaml. Or use the Prometheus UI at /rules to see current rule evaluation status. For unit testing, write .yaml test files with expected alert states at specific timestamps.