ELK handles logs. But what about metrics? CPU, memory, response time, error rate, business metrics? Prometheus from SoundCloud (now a CNCF project) is a monitoring system designed for dynamic, containerized environments.
Pull model¶
Unlike Graphite (push), Prometheus actively scrapes metrics from endpoints. Each service exposes a /metrics endpoint that Prometheus reads periodically. Advantage: you immediately see when a service stops responding.
Metrics in the application¶
# prometheus.yml
scrape_configs:
- job_name: 'user-api'
scrape_interval: 15s
static_configs:
- targets: ['user-api:8080']
# In a Java application (Micrometer)
Counter requestCounter = Counter.builder("api.requests")
.tag("endpoint", "/users")
.tag("method", "GET")
.register(meterRegistry);
PromQL¶
A query language for metrics. rate(http_requests_total[5m]) — request rate
over the last 5 minutes. histogram_quantile(0.95, ...) — 95th percentile latency.
Powerful, but with a learning curve.
Grafana dashboards¶
Grafana visualizes Prometheus data. Community dashboards for Node.js, JVM, Docker, Nginx. Alerting via Alertmanager — Slack, email, PagerDuty.
Prometheus vs. Graphite/InfluxDB¶
Prometheus: pull model, PromQL, service discovery, designed for containers. Graphite: push model, older, stable. InfluxDB: push, SQL-like query, better for IoT time series. For microservices Prometheus clearly leads.
Metrics are the foundation of SRE¶
Logs tell you what happened. Metrics tell you how the system is behaving. Prometheus + Grafana = the standard for monitoring cloud-native applications.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us