Incident Response Procedure
Owner: Anchor MSP Operations Lead Last reviewed: 2026-05-24
Purpose
Define the standard incident response procedure for all systems under Anchor managed production. This procedure ensures incidents are detected, contained, resolved, and documented in a consistent, repeatable manner.
Scope
All incidents affecting systems, infrastructure, and services managed by Anchor MSP. This includes security incidents, outages, performance degradation, and data integrity issues.
Procedure
Anchor incident response follows six phases. Each phase has defined actions, communication requirements, and escalation paths.
Phase 1: Detection
Incidents are detected through three channels:
- Monitoring alerts — Alertmanager fires based on Prometheus rules, Wazuh security events, or CrowdSec threat intelligence. Alerts route to Slack channels per the Alert Severity Matrix.
- Client reports — A client or development team reports an issue via Slack or email. The operator receiving the report creates an incident record.
- Operator observation — An Anchor operator observes anomalous behavior during routine work or monitoring review.
Communication: The detecting operator posts an initial notice in #anchor-incidents with the system name, observed symptoms, and time of detection.
Phase 2: Triage
- Classify the incident severity per the Severity Classification matrix.
- Assign an Incident Commander (IC). For Critical and High severity, the IC is the Operations Lead or the senior operator on call. For Medium and Low, the detecting operator serves as IC.
- The IC confirms the classification and opens a dedicated incident thread in
#anchor-incidents(Critical/High) or documents in the existing thread (Medium/Low).
Communication: IC posts the confirmed severity, affected systems, and initial assessment. For Critical severity, the IC sends an SMS escalation within 15 minutes.
Escalation path:
- Critical: Operations Lead notified immediately. If unreachable within 10 minutes, escalate to the designated backup.
- High: Operations Lead notified within 30 minutes.
- Medium: Raised at next standup.
- Low: Tracked in the incident log.
Phase 3: Containment
- Isolate the affected system to prevent the incident from spreading. Actions depend on the incident type:
- Network isolation via firewall rules for security incidents.
- Traffic rerouting or service shutdown for application incidents.
- For confirmed active threats, consider BRB Protocol deployment.
- Preserve evidence. Do not reboot, wipe logs, or redeploy before forensic data is collected. For security incidents, collect forensic snapshots before any remediation.
- Prevent lateral spread. Review adjacent systems for indicators of compromise or cascading failure.
Communication: IC updates the incident thread with containment actions taken and current system status.
Phase 4: Eradication
- Identify and remove the root cause. This may involve patching a vulnerability, reverting a configuration change, removing malicious artifacts, or fixing a code defect.
- Patch vulnerabilities discovered during investigation. If an immediate patch is not available, document the compensating control.
- Verify the system is clean. Run integrity checks, review logs for residual indicators, and confirm no persistence mechanisms remain (for security incidents).
Communication: IC documents the root cause identified and eradication actions taken in the incident thread.
Phase 5: Recovery
- Restore services in a controlled manner. Follow the relevant runbook for the affected system. If restoring from backup, follow the Backup Policy restore procedures.
- Verify functionality. Confirm health checks pass, monitoring is green, and the application is serving requests correctly.
- Monitor closely for 24-48 hours after recovery. Set temporary lower thresholds on alerts for the affected system to catch any recurrence quickly.
Communication: IC announces recovery in the incident thread and notifies the client that services are restored. For Critical incidents, the client receives a written recovery notification.
Phase 6: Post-Incident
- Schedule a postmortem within 5 business days of incident resolution. Use the Postmortem Template.
- Assign action items from the postmortem with owners and due dates.
- Document lessons learned and update runbooks, monitoring rules, or procedures as needed.
- Close the incident in the incident log with a link to the completed postmortem.
Communication: Postmortem summary is shared in #anchor-incidents and with the affected client (for Critical and High incidents).
Incident Log
All incidents are logged with the following fields:
| Field | Description |
|---|---|
| Incident ID | Sequential identifier |
| Date/Time | Detection timestamp |
| Severity | Critical, High, Medium, Low |
| System | Affected system name |
| IC | Incident Commander |
| Status | Open, Contained, Resolved, Closed |
| Postmortem | Link to postmortem (if applicable) |
Exceptions
No exceptions to this procedure for Critical and High severity incidents. Medium and Low severity incidents may use an abbreviated process (phases 1, 2, 4, 5 only) at the IC's discretion, but must still be logged.