Prometheus, the monitoring system developed at SoundCloud, introduces a pull-based model, a flexible query language (PromQL), and native support for dynamic environments.
Monitoring for the Container Era¶
Traditional monitoring tools (Nagios, Zabbix) assume static infrastructure — manually configured hosts with permanent IP addresses. In a containerized environment where instances are created and destroyed dynamically, this model breaks down.
Prometheus was developed at SoundCloud specifically for dynamic, cloud-native environments. Inspired by Google’s internal Borgmon system, it brings large-scale monitoring principles within reach of every engineering team.
Pull Model and Service Discovery¶
Prometheus actively scrapes metrics from HTTP endpoints exposed by services — the opposite of a push model (StatsD, Graphite).
Advantages of the pull model:
- Simpler — a service only needs to expose a
/metricsendpoint - Failure detection — if a scrape fails, the service is down
- Service discovery integration — Consul, Kubernetes, DNS
# Prometheus configuration
scrape_configs:
- job_name: 'web-app'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: web
action: keep
PromQL — Query Language¶
PromQL is one of Prometheus’s greatest strengths — a flexible query language for metrics:
# Request rate per second over the last 5 minutes
rate(http_requests_total[5m])
# 99th percentile latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# Error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
PromQL enables ad-hoc analysis, dashboard creation, and the definition of alerting rules.
Alerting and Grafana Integration¶
Prometheus Alertmanager handles alerts — deduplication, grouping, silencing, and routing to notification channels (email, Slack, PagerDuty).
For visualization, Prometheus pairs perfectly with Grafana — the most popular open-source dashboarding tool. The combination of Prometheus + Grafana + Alertmanager forms a complete monitoring stack.
Recommended metrics to monitor: RED (Rate, Errors, Duration) for services, USE (Utilization, Saturation, Errors) for infrastructure.
Conclusion: The Standard for Cloud-Native Monitoring¶
Prometheus is rapidly becoming the standard for monitoring in cloud-native environments. It was the second project accepted into CNCF after Kubernetes — that is no coincidence. For every new project involving containers, we recommend Prometheus as the primary monitoring solution.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us