observability

Prometheus Alert Rules Generator

Generate Prometheus alerting rules for pod crash-looping, high CPU/memory, node status, PVC usage, and certificate expiry. Ready to apply as a PrometheusRule CRD.

Generator

Inputs

Rule Group Name

Namespace Filter (regex)

Scope alerts to specific namespaces. Use .* for all, or 'production|staging' for multiple.

Memory Alert Threshold

Alert when container uses this % of its memory limit.

CPU Alert Threshold

Crash-loop restart threshold

Alert when a pod restarts more than this many times in 15 minutes.

Custom Service Name (optional)

Adds a ServiceDown alert for a specific Prometheus job name.

Node alerts

PVC storage alerts

Certificate expiry alerts

Output — alert-rules.yaml

groups:
  - name: kubernetes-alerts
    interval: 30s
    rules:

      # ── Pod alerts ──────────────────────────────────────────────────
      - alert: PodCrashLooping
        expr: |
          rate(kube_pod_container_status_restarts_total{namespace=~".*"}[15m]) * 60 * 15 > 5
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Pod {{ $labels.pod }} is crash-looping"
          description: "{{ $labels.pod }} in {{ $labels.namespace }} has restarted {{ $value | humanize }} times in 15m"

      - alert: PodNotReady
        expr: |
          kube_pod_status_ready{condition="false",namespace=~".*"} == 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Pod {{ $labels.pod }} is not ready"
          description: "{{ $labels.pod }} in {{ $labels.namespace }} has been not-ready for 5m"

      - alert: ContainerMemoryHigh
        expr: |
          (container_memory_working_set_bytes{namespace=~".*",container!=""} /
           container_spec_memory_limit_bytes{namespace=~".*",container!=""}) * 100 > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.container }} memory high ({{ $value | humanize }}%)"
          description: "{{ $labels.container }} in {{ $labels.pod }} is using {{ $value | humanize }}% of its memory limit"

      - alert: ContainerCpuHigh
        expr: |
          (rate(container_cpu_usage_seconds_total{namespace=~".*",container!=""}[5m]) /
           container_spec_cpu_quota{namespace=~".*",container!=""} * container_spec_cpu_period{namespace=~".*",container!=""}) * 100 > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.container }} CPU high ({{ $value | humanize }}%)"
          description: "{{ $labels.container }} in {{ $labels.pod }} is using {{ $value | humanize }}% of its CPU limit"

      # ── Node alerts ──────────────────────────────────────────────────
      - alert: NodeMemoryHigh
        expr: |
          (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Node {{ $labels.instance }} memory high ({{ $value | humanize }}%)"

      - alert: NodeDiskHigh
        expr: |
          (1 - (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes{fstype!="tmpfs"})) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Node {{ $labels.instance }} disk {{ $labels.mountpoint }} high ({{ $value | humanize }}%)"

      - alert: NodeNotReady
        expr: kube_node_status_condition{condition="Ready",status="true"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Node {{ $labels.node }} is not ready"

Writing Prometheus Alert Rules

Prometheus alerting rules evaluate PromQL expressions on a schedule. When the expression returns results, the alert is in 'pending'. After the 'for' duration, it fires to Alertmanager.

Rule Structure

yaml

groups:
  - name: my-alerts
    interval: 30s       # evaluation frequency
    rules:
      - alert: AlertName
        expr: |
          promql_expression > threshold
        for: 5m          # must be true for 5m to fire
        labels:
          severity: critical
        annotations:
          summary: "Human-readable summary"
          description: "Detail with {{ $labels.pod }} template"

Common PromQL Patterns

promql

# Container memory % of limit
(container_memory_working_set_bytes / container_spec_memory_limit_bytes) * 100 > 90

# Pod restart rate (restarts in last 15min) rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 5

# Node disk usage % (1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85 ```

PrometheusRule CRD Wrapper

yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack  # must match Prometheus selector
spec:
  groups:
    # paste generated rules here

Key Terms

Full glossary →

Prometheus

An open-source metrics and alerting system. Prometheus scrapes metrics from Kubernetes components and applications, stores them in a time-series database (TSDB), and evaluates alert rules.

Loki

A horizontally scalable log aggregation system by Grafana Labs. Unlike Elasticsearch, Loki only indexes metadata labels, storing log content as compressed chunks in object storage.

Frequently Asked Questions

How do I apply these rules to my cluster?

If using kube-prometheus-stack (Helm), create a PrometheusRule resource and it will be auto-discovered. Otherwise, save as a file in your Prometheus rules directory and reload: curl -X POST http://localhost:9090/-/reload. The generated YAML is structured as a rules group, not a PrometheusRule CRD — add apiVersion/kind wrapper if needed.

What is the 'for' field in alert rules?

The 'for' duration means the condition must be true continuously for that long before the alert fires. 'for: 5m' prevents noisy alerts from transient spikes. Set lower values (1m, 2m) for critical alerts you need fast, higher values (15m, 30m) for capacity warnings.

What's the difference between warning and critical severity?

Severity is just a label — Alertmanager uses it to route alerts to different receivers. Typical setup: critical → PagerDuty/on-call, warning → Slack channel. Define your routing rules in Alertmanager config to match on severity label.

How do I test alert rules without firing them?

Use promtool: promtool test rules test.yaml. Or use the Prometheus UI at /rules to see current rule evaluation status. For unit testing, write .yaml test files with expected alert states at specific timestamps.

Related Calculators

Prometheus Storage Observability Sizing Grafana Sizing

Related Generators

K8s Deployment

Related Comparisons

Loki vs Elasticsearch for Kubernetes Logging

Related Guides

observability

Kubernetes Monitoring Stack Guide: Prometheus, Loki, Grafana, and Tempo

A complete guide to deploying the metrics, logs, traces, and dashboards stack on Kubernetes — including resource sizing, Helm configs, and alert rule generation.