Prometheus — monitoring for the container world

Nagios served us faithfully for ten years. But in the dynamic world of containers, where pods are born and die every minute, static monitoring configuration is unsustainable. Prometheus with its service discovery and pull model is exactly what we need.

Why not Nagios/Zabbix¶

Traditional monitoring works on the principle: configure a list of hosts, define checks, monitor. But in Kubernetes you don’t have “hosts” — you have pods that dynamically move between nodes, scale up and down, die and are reborn.

Prometheus architecture¶

Prometheus works on a pull model — it actively fetches metrics from defined endpoints. In Kubernetes it has native service discovery: it automatically finds all pods with the prometheus.io/scrape: "true" annotation and starts collecting metrics from them.

PromQL — a language you either love or hate¶

# Request rate per second over the last 5 minutes
rate(http_requests_total[5m])

# 99th percentile latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

# Pod CPU utilization
rate(container_cpu_usage_seconds_total{namespace="production"}[5m])

You learn PromQL gradually, but once you master it, you can answer questions you would never ask with Nagios.

Grafana dashboards¶

Prometheus itself has a minimalist web UI. For visualization we use Grafana, which has native Prometheus datasource. The community shares thousands of ready-made dashboards on grafana.com.

Alerting with Alertmanager¶

groups:
- name: application
  rules:
  - alert: HighErrorRate
    expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High error rate on {{ $labels.service }}"

Application instrumentation¶

Prometheus client libraries exist for Java, Go, Python, Node.js and others. In Spring Boot, just add Micrometer with Prometheus registry and you have metrics in minutes. Counter, Gauge, Histogram, Summary — four metric types cover most needs.

Prometheus is the standard for cloud-native monitoring¶

The transition from Nagios wasn’t trivial — we had to rethink what and how we monitor. But the result is incomparably better. Prometheus with Grafana and Alertmanager is now our standard monitoring trio.

prometheusmonitoringkubernetesgrafana

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.