Skip to main content

Alert Severity Matrix

Owner: Anchor MSP Operations Lead Last reviewed: 2026-04-04

Purpose

Define alert severity levels, response expectations, and escalation rules. All Alertmanager rules must use these severity levels.

Scope

All alerts generated by any monitored system under Anchor management.

Severity Levels

SeverityAlertmanager LabelCriteriaSlack ChannelResponse TimeEscalation
Criticalseverity: criticalService down. Data loss risk. Complete outage.#alerts-critical + SMS15 minutesImmediate — SMS to on-call, escalation chain starts
Highseverity: highDegraded service. Partial outage. Major performance issue.#alerts-high1 hourAfter 1 hour unacknowledged — escalate to secondary
Mediumseverity: mediumAnomaly detected. Threshold breach. Non-urgent degradation.#alerts-mediumNext business dayNone
Lowseverity: lowInformational. Trend changes. Capacity warnings.#alerts-lowWeekly reviewNone

Rules

  1. Every Alertmanager rule must include a severity label with one of the four values above.
  2. A new managed system must have at minimum: one critical rule (health check failure), one high rule (error rate spike), and one medium rule (resource threshold).
  3. Do not create alerts without a defined severity. Unclassified alerts are a policy violation.
  4. Severity levels are reviewed quarterly. If an alert consistently fires without action, downgrade or remove it.

Exceptions

Severity overrides require approval from the Operations Lead. Document the override reason in the alert rule comments.