Skip to main content

Dashboard Standards

Owner: Anchor MSP Operations Lead Last reviewed: 2026-05-24

Purpose

Define standards for Grafana dashboards used to monitor systems under Anchor managed production. Consistent dashboards enable faster incident response, easier onboarding, and reliable operational visibility.

Scope

All Grafana dashboards created for systems managed by Anchor MSP. This includes system dashboards, application dashboards, security dashboards, and overview dashboards.

Policy

Dashboard Naming Convention

All dashboards follow the naming pattern: {client}-{system}-{category}

Examples:

  • egi-api-overview
  • egi-api-performance
  • mast-web-errors
  • anchor-infrastructure-nodes
  1. client — The client or project identifier (e.g., egi, mast, anchor).
  2. system — The specific system or service (e.g., api, web, database, infrastructure).
  3. category — The dashboard focus area (e.g., overview, performance, errors, security, nodes).

Dashboard names are lowercase with hyphens. No spaces, no underscores.

Folder Structure

Grafana dashboards are organized into folders by client:

Grafana/
Anchor/ -- Internal Anchor infrastructure dashboards
EGI/ -- EGI client dashboards
Mast/ -- Mast client dashboards
Shared/ -- Cross-client dashboards (e.g., fleet overview)
  1. Each client folder contains only dashboards for that client's systems.
  2. The Shared folder contains dashboards that aggregate data across clients (e.g., fleet-wide node health).
  3. New folders are created only when a new client is onboarded. Do not create per-system or per-category subfolders.

Required Panels

Every managed system must have at minimum the following dashboard panels. These may be on one or more dashboards.

System Overview Dashboard

PanelData SourceDescription
System uptimeup metricCurrent up/down status with uptime percentage
CPU usagenode_cpu_seconds_totalCPU utilization over time, by mode
Memory usagenode_memory_MemAvailable_bytesUsed vs. available memory
Disk usagenode_filesystem_avail_bytesDisk utilization per mount point
Network I/Onode_network_*_bytes_totalInbound and outbound traffic
Active alertsAlertmanager datasourceCurrently firing alerts for this system

Application Health Dashboard

PanelData SourceDescription
Request ratehttp_requests_totalRequests per second over time
Error ratehttp_requests_total (filtered by 5xx)Error percentage over time
Response timehttp_request_duration_secondsP50, P95, P99 latency
Health checkapp_up or health endpointCurrent application health status

Error and Log Dashboard

PanelData SourceDescription
Error log streamLokiLive stream of ERROR and FATAL log entries
Error count over timeLokiCount of error-level logs, bucketed by time
Top errorsLokiMost frequent error messages

Dashboard Provisioning

  1. Dashboards are provisioned as code using Grafana provisioning or the Grafana API. Manual dashboard creation via the Grafana UI is permitted for prototyping but must be exported and committed to the infrastructure repository before it is considered production.
  2. Dashboard JSON files are stored in the infrastructure repository under grafana/dashboards/{client}/.
  3. Dashboard changes follow the standard Production Change Policy.
  4. Variables (template variables) are used for reusable dashboards. Common variables include $host, $service, and $interval.

Alert Integration

  1. Every dashboard with alerting panels must link to the relevant Alertmanager silence and alert history.
  2. Alert annotations on dashboards (vertical lines marking alert start/end) are enabled for all time-series panels.
  3. Dashboard links to related runbooks are included in the dashboard description or as panel links.

Dashboard Access

  1. All Anchor operators have Editor access to dashboards in client folders they manage.
  2. Client teams (EGI, Mast) receive Viewer access to their dashboards. They can view but not modify.
  3. The Shared folder is read-only for all non-Anchor users.
  4. Dashboard access is reviewed as part of the quarterly Access Control review.

Dashboard Review

  1. Dashboards are reviewed quarterly alongside the access review.
  2. Review checks:
    • Are all panels showing data? (No broken queries or "No data" panels.)
    • Are thresholds and alert conditions still appropriate?
    • Are there unused dashboards that should be archived or removed?
  3. Dashboard review findings are documented and action items assigned.

Exceptions

Temporary dashboards for incident investigation do not need to follow naming or provisioning standards. They must be deleted or formalized within 7 days of the incident's closure.