Dashboard Standards
Owner: Anchor MSP Operations Lead Last reviewed: 2026-05-24
Purpose
Define standards for Grafana dashboards used to monitor systems under Anchor managed production. Consistent dashboards enable faster incident response, easier onboarding, and reliable operational visibility.
Scope
All Grafana dashboards created for systems managed by Anchor MSP. This includes system dashboards, application dashboards, security dashboards, and overview dashboards.
Policy
Dashboard Naming Convention
All dashboards follow the naming pattern: {client}-{system}-{category}
Examples:
egi-api-overviewegi-api-performancemast-web-errorsanchor-infrastructure-nodes
- client — The client or project identifier (e.g.,
egi,mast,anchor). - system — The specific system or service (e.g.,
api,web,database,infrastructure). - category — The dashboard focus area (e.g.,
overview,performance,errors,security,nodes).
Dashboard names are lowercase with hyphens. No spaces, no underscores.
Folder Structure
Grafana dashboards are organized into folders by client:
Grafana/
Anchor/ -- Internal Anchor infrastructure dashboards
EGI/ -- EGI client dashboards
Mast/ -- Mast client dashboards
Shared/ -- Cross-client dashboards (e.g., fleet overview)
- Each client folder contains only dashboards for that client's systems.
- The
Sharedfolder contains dashboards that aggregate data across clients (e.g., fleet-wide node health). - New folders are created only when a new client is onboarded. Do not create per-system or per-category subfolders.
Required Panels
Every managed system must have at minimum the following dashboard panels. These may be on one or more dashboards.
System Overview Dashboard
| Panel | Data Source | Description |
|---|---|---|
| System uptime | up metric | Current up/down status with uptime percentage |
| CPU usage | node_cpu_seconds_total | CPU utilization over time, by mode |
| Memory usage | node_memory_MemAvailable_bytes | Used vs. available memory |
| Disk usage | node_filesystem_avail_bytes | Disk utilization per mount point |
| Network I/O | node_network_*_bytes_total | Inbound and outbound traffic |
| Active alerts | Alertmanager datasource | Currently firing alerts for this system |
Application Health Dashboard
| Panel | Data Source | Description |
|---|---|---|
| Request rate | http_requests_total | Requests per second over time |
| Error rate | http_requests_total (filtered by 5xx) | Error percentage over time |
| Response time | http_request_duration_seconds | P50, P95, P99 latency |
| Health check | app_up or health endpoint | Current application health status |
Error and Log Dashboard
| Panel | Data Source | Description |
|---|---|---|
| Error log stream | Loki | Live stream of ERROR and FATAL log entries |
| Error count over time | Loki | Count of error-level logs, bucketed by time |
| Top errors | Loki | Most frequent error messages |
Dashboard Provisioning
- Dashboards are provisioned as code using Grafana provisioning or the Grafana API. Manual dashboard creation via the Grafana UI is permitted for prototyping but must be exported and committed to the infrastructure repository before it is considered production.
- Dashboard JSON files are stored in the infrastructure repository under
grafana/dashboards/{client}/. - Dashboard changes follow the standard Production Change Policy.
- Variables (template variables) are used for reusable dashboards. Common variables include
$host,$service, and$interval.
Alert Integration
- Every dashboard with alerting panels must link to the relevant Alertmanager silence and alert history.
- Alert annotations on dashboards (vertical lines marking alert start/end) are enabled for all time-series panels.
- Dashboard links to related runbooks are included in the dashboard description or as panel links.
Dashboard Access
- All Anchor operators have Editor access to dashboards in client folders they manage.
- Client teams (EGI, Mast) receive Viewer access to their dashboards. They can view but not modify.
- The
Sharedfolder is read-only for all non-Anchor users. - Dashboard access is reviewed as part of the quarterly Access Control review.
Dashboard Review
- Dashboards are reviewed quarterly alongside the access review.
- Review checks:
- Are all panels showing data? (No broken queries or "No data" panels.)
- Are thresholds and alert conditions still appropriate?
- Are there unused dashboards that should be archived or removed?
- Dashboard review findings are documented and action items assigned.
Exceptions
Temporary dashboards for incident investigation do not need to follow naming or provisioning standards. They must be deleted or formalized within 7 days of the incident's closure.