BRB Recovery Procedures
Owner: Anchor MSP Operations Lead Last reviewed: 2026-05-24
Purpose
Define the staged recovery process for systems that have been locked down by the BRB Protocol. Recovery is deliberate, staged, and requires dual-operator approval to prevent premature restoration of a compromised system.
Scope
All systems under Anchor managed production that are currently in a BRB lockdown state.
Staged Recovery
BRB recovery is performed in stages. Each stage restores a specific set of capabilities while allowing operators to verify system integrity before proceeding to the next stage. Stages must be executed in order.
Recovery Stages
| Stage | What Is Restored | What Remains Locked | Verification Before Proceeding |
|---|---|---|---|
| 1. Network | Outbound and inbound network access restored. Firewall rules reverted to pre-lockdown state. | Services remain stopped. User accounts remain locked. | Verify no malicious outbound connections. Check DNS resolution. Confirm monitoring server can reach the host. |
| 2. Services | All configured services started (Docker, database, web server, application). | User accounts remain locked (except emergency user). | Verify services start cleanly. Check application health endpoints. Review service logs for errors. Confirm no unexpected processes. |
| 3. User Accounts | All user accounts unlocked. Full system access restored. | Nothing. System is fully recovered. | Verify user logins work. Confirm no unauthorized accounts were created during the incident. Review account list against the access control register. |
Recovery Workflow
Lockdown State
│
▼
Operator A submits recovery request (Stage 1: Network)
│
▼
Operator B approves recovery request (different user required)
│
▼
Agent restores network access
│
▼
Operators verify network integrity
│
▼
Operator A submits recovery request (Stage 2: Services)
│
▼
Operator B approves recovery request
│
▼
Agent starts services
│
▼
Operators verify service health
│
▼
Operator A submits recovery request (Stage 3: User Accounts)
│
▼
Operator B approves recovery request
│
▼
Agent unlocks user accounts
│
▼
Operators verify full system health
│
▼
System fully recovered
Dual-Approval Requirement
- Every recovery stage requires approval from two different operators. The same operator cannot submit and approve the same recovery request.
- This requirement is enforced by the BRB controller. Duplicate approvals from the same operator are rejected.
- The dual-approval requirement exists to prevent a scenario where an attacker who has compromised one operator's credentials can unlock a contained system.
- Both operators must authenticate to the BRB controller before submitting or approving a recovery request.
Submitting Recovery Requests
Via API
# Operator A submits the recovery request
curl -X POST https://brb-controller.anchor.internal/api/v1/recover \
-H "Authorization: Bearer $OPERATOR_A_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"system_id": "SYSTEM_ID",
"stage": "network",
"justification": "Forensic review complete. No active threat indicators found."
}'
# Operator B approves the recovery request
curl -X POST https://brb-controller.anchor.internal/api/v1/recover/approve \
-H "Authorization: Bearer $OPERATOR_B_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"system_id": "SYSTEM_ID",
"stage": "network",
"approval_note": "Confirmed forensic review. Network recovery approved."
}'
Via Dashboard (Glance)
- Navigate to the locked-down system in Glance.
- Select the recovery stage to initiate.
- Enter the justification and submit.
- The second operator logs in separately and approves the pending recovery request.
Post-Recovery Monitoring
After full recovery (all three stages complete), the system enters a post-recovery monitoring period of 48 hours.
During this period:
- Alert thresholds are lowered temporarily to catch subtle indicators of recurrence. Thresholds are restored to normal after the 48-hour period.
- Security monitoring is heightened. Wazuh file integrity monitoring runs on an accelerated schedule. CrowdSec block lists are verified current.
- Log review. Operators review application and system logs twice daily (morning and evening) for the 48-hour period.
- Operator availability. At least one operator must be reachable within 15 minutes during the post-recovery monitoring period.
Post-Recovery Checklist
| Check | Description | Timing |
|---|---|---|
| Application health | All health endpoints returning 200 | Immediately after recovery |
| Monitoring coverage | All Prometheus targets up, Uptime Kuma checks green | Within 1 hour |
| Log flow | Promtail shipping logs to Loki, no gaps | Within 1 hour |
| Security agents | Wazuh and CrowdSec running and reporting | Within 1 hour |
| Backup verification | Backup job runs successfully after recovery | Next scheduled backup |
| BRB agent health | BRB agent health check passing | Immediately after recovery |
| Client notification | Client informed of recovery and current status | Within 2 hours |
| Postmortem scheduled | Postmortem meeting scheduled within 5 business days | Within 24 hours |
Recovery Failure
If a recovery stage fails (e.g., services fail to start, network rules fail to revert):
- The agent reports the failure to the controller.
- The controller alerts
#anchor-incidents-critical. - The operator investigates via the emergency SSH session.
- The operator may retry the failed stage or perform manual recovery actions.
- If manual recovery is performed, document all actions taken for the postmortem.
- Do not proceed to the next recovery stage until the current stage is confirmed successful.
Exceptions
None. The dual-approval requirement cannot be bypassed. If only one operator is available, the system remains in lockdown until a second operator is available. This is a deliberate security control.