observability
Prometheus Alert Rules Generator
Generate Prometheus alerting rules for pod crash-looping, high CPU/memory, node status, PVC usage, and certificate expiry. Ready to apply as a PrometheusRule CRD.
Writing Prometheus Alert Rules
Prometheus alerting rules evaluate PromQL expressions on a schedule. When the expression returns results, the alert is in 'pending'. After the 'for' duration, it fires to Alertmanager.
Rule Structure
groups:
- name: my-alerts
interval: 30s # evaluation frequency
rules:
- alert: AlertName
expr: |
promql_expression > threshold
for: 5m # must be true for 5m to fire
labels:
severity: critical
annotations:
summary: "Human-readable summary"
description: "Detail with {{ $labels.pod }} template"Common PromQL Patterns
# Container memory % of limit
(container_memory_working_set_bytes / container_spec_memory_limit_bytes) * 100 > 90# Pod restart rate (restarts in last 15min) rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 5
# Node disk usage % (1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85 ```
PrometheusRule CRD Wrapper
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: my-alerts
namespace: monitoring
labels:
release: kube-prometheus-stack # must match Prometheus selector
spec:
groups:
# paste generated rules hereFrequently Asked Questions
How do I apply these rules to my cluster?
If using kube-prometheus-stack (Helm), create a PrometheusRule resource and it will be auto-discovered. Otherwise, save as a file in your Prometheus rules directory and reload: curl -X POST http://localhost:9090/-/reload. The generated YAML is structured as a rules group, not a PrometheusRule CRD — add apiVersion/kind wrapper if needed.
What is the 'for' field in alert rules?
The 'for' duration means the condition must be true continuously for that long before the alert fires. 'for: 5m' prevents noisy alerts from transient spikes. Set lower values (1m, 2m) for critical alerts you need fast, higher values (15m, 30m) for capacity warnings.
What's the difference between warning and critical severity?
Severity is just a label — Alertmanager uses it to route alerts to different receivers. Typical setup: critical → PagerDuty/on-call, warning → Slack channel. Define your routing rules in Alertmanager config to match on severity label.
How do I test alert rules without firing them?
Use promtool: promtool test rules test.yaml. Or use the Prometheus UI at /rules to see current rule evaluation status. For unit testing, write .yaml test files with expected alert states at specific timestamps.