Alert Severity Matrix
Owner: Anchor MSP Operations Lead Last reviewed: 2026-04-04
Purpose
Define alert severity levels, response expectations, and escalation rules. All Alertmanager rules must use these severity levels.
Scope
All alerts generated by any monitored system under Anchor management.
Severity Levels
| Severity | Alertmanager Label | Criteria | Slack Channel | Response Time | Escalation |
|---|---|---|---|---|---|
| Critical | severity: critical | Service down. Data loss risk. Complete outage. | #alerts-critical + SMS | 15 minutes | Immediate — SMS to on-call, escalation chain starts |
| High | severity: high | Degraded service. Partial outage. Major performance issue. | #alerts-high | 1 hour | After 1 hour unacknowledged — escalate to secondary |
| Medium | severity: medium | Anomaly detected. Threshold breach. Non-urgent degradation. | #alerts-medium | Next business day | None |
| Low | severity: low | Informational. Trend changes. Capacity warnings. | #alerts-low | Weekly review | None |
Rules
- Every Alertmanager rule must include a
severitylabel with one of the four values above. - A new managed system must have at minimum: one critical rule (health check failure), one high rule (error rate spike), and one medium rule (resource threshold).
- Do not create alerts without a defined severity. Unclassified alerts are a policy violation.
- Severity levels are reviewed quarterly. If an alert consistently fires without action, downgrade or remove it.
Exceptions
Severity overrides require approval from the Operations Lead. Document the override reason in the alert rule comments.