DevOps Intermediate
Grafana — Effective Dashboards¶
GrafanaDashboardsMonitoringObservability 5 min read
Best practices for Grafana dashboards. Dashboard as Code, provisioning, alerting and design principles.
Dashboard Design Principles¶
- USE method — Utilization, Saturation, Errors for each resource
- RED method — Rate, Errors, Duration for each service
- Hierarchy — Overview, then Service, then Detail (drill-down)
- Max 10-12 panels per dashboard
- Consistent colors — green OK, yellow warning, red critical
Dashboard as Code¶
# Provisioning
apiVersion: 1
providers:
- name: default
orgId: 1
folder: ''
type: file
options:
path: /var/lib/grafana/dashboards
# Grafonnet (Jsonnet)
local grafana = import 'grafonnet/grafana.libsonnet';
local dashboard = grafana.dashboard;
local prometheus = grafana.prometheus;
dashboard.new('API Overview', tags=['api'])
.addPanel(
grafana.graphPanel.new('Request Rate', datasource='Prometheus')
.addTarget(prometheus.target('sum(rate(http_requests_total[5m]))'))
, gridPos={x:0, y:0, w:12, h:8})
Alerting¶
apiVersion: 1
groups:
- orgId: 1
name: sre-alerts
folder: SRE
interval: 1m
rules:
- uid: high-error-rate
title: High Error Rate
condition: C
data:
- refId: A
datasourceUid: prometheus
model:
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100
for: 5m
labels:
severity: critical
Summary¶
Effective Grafana dashboards follow USE/RED principles, are versioned as code, and have a clear hierarchy.
Need Help with Implementation?¶
Our team has experience designing and implementing modern architectures. We’re happy to help.