BRB Recovery Procedures

Owner: Anchor MSP Operations Lead Last reviewed: 2026-05-24

Purpose

Define the staged recovery process for systems that have been locked down by the BRB Protocol. Recovery is deliberate, staged, and requires dual-operator approval to prevent premature restoration of a compromised system.

Scope

All systems under Anchor managed production that are currently in a BRB lockdown state.

Staged Recovery

BRB recovery is performed in stages. Each stage restores a specific set of capabilities while allowing operators to verify system integrity before proceeding to the next stage. Stages must be executed in order.

Recovery Stages

Stage	What Is Restored	What Remains Locked	Verification Before Proceeding
1. Network	Outbound and inbound network access restored. Firewall rules reverted to pre-lockdown state.	Services remain stopped. User accounts remain locked.	Verify no malicious outbound connections. Check DNS resolution. Confirm monitoring server can reach the host.
2. Services	All configured services started (Docker, database, web server, application).	User accounts remain locked (except emergency user).	Verify services start cleanly. Check application health endpoints. Review service logs for errors. Confirm no unexpected processes.
3. User Accounts	All user accounts unlocked. Full system access restored.	Nothing. System is fully recovered.	Verify user logins work. Confirm no unauthorized accounts were created during the incident. Review account list against the access control register.

Recovery Workflow

Lockdown State
     │
     ▼
Operator A submits recovery request (Stage 1: Network)
     │
     ▼
Operator B approves recovery request (different user required)
     │
     ▼
Agent restores network access
     │
     ▼
Operators verify network integrity
     │
     ▼
Operator A submits recovery request (Stage 2: Services)
     │
     ▼
Operator B approves recovery request
     │
     ▼
Agent starts services
     │
     ▼
Operators verify service health
     │
     ▼
Operator A submits recovery request (Stage 3: User Accounts)
     │
     ▼
Operator B approves recovery request
     │
     ▼
Agent unlocks user accounts
     │
     ▼
Operators verify full system health
     │
     ▼
System fully recovered

Dual-Approval Requirement

Every recovery stage requires approval from two different operators. The same operator cannot submit and approve the same recovery request.
This requirement is enforced by the BRB controller. Duplicate approvals from the same operator are rejected.
The dual-approval requirement exists to prevent a scenario where an attacker who has compromised one operator's credentials can unlock a contained system.
Both operators must authenticate to the BRB controller before submitting or approving a recovery request.

Submitting Recovery Requests

Via API

# Operator A submits the recovery request
curl -X POST https://brb-controller.anchor.internal/api/v1/recover \
  -H "Authorization: Bearer $OPERATOR_A_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "system_id": "SYSTEM_ID",
    "stage": "network",
    "justification": "Forensic review complete. No active threat indicators found."
  }'

# Operator B approves the recovery request
curl -X POST https://brb-controller.anchor.internal/api/v1/recover/approve \
  -H "Authorization: Bearer $OPERATOR_B_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "system_id": "SYSTEM_ID",
    "stage": "network",
    "approval_note": "Confirmed forensic review. Network recovery approved."
  }'

Via Dashboard (Glance)

Navigate to the locked-down system in Glance.
Select the recovery stage to initiate.
Enter the justification and submit.
The second operator logs in separately and approves the pending recovery request.

Post-Recovery Monitoring

After full recovery (all three stages complete), the system enters a post-recovery monitoring period of 48 hours.

During this period:

Alert thresholds are lowered temporarily to catch subtle indicators of recurrence. Thresholds are restored to normal after the 48-hour period.
Security monitoring is heightened. Wazuh file integrity monitoring runs on an accelerated schedule. CrowdSec block lists are verified current.
Log review. Operators review application and system logs twice daily (morning and evening) for the 48-hour period.
Operator availability. At least one operator must be reachable within 15 minutes during the post-recovery monitoring period.

Post-Recovery Checklist

Check	Description	Timing
Application health	All health endpoints returning 200	Immediately after recovery
Monitoring coverage	All Prometheus targets up, Uptime Kuma checks green	Within 1 hour
Log flow	Promtail shipping logs to Loki, no gaps	Within 1 hour
Security agents	Wazuh and CrowdSec running and reporting	Within 1 hour
Backup verification	Backup job runs successfully after recovery	Next scheduled backup
BRB agent health	BRB agent health check passing	Immediately after recovery
Client notification	Client informed of recovery and current status	Within 2 hours
Postmortem scheduled	Postmortem meeting scheduled within 5 business days	Within 24 hours

Recovery Failure

If a recovery stage fails (e.g., services fail to start, network rules fail to revert):

The agent reports the failure to the controller.
The controller alerts #anchor-incidents-critical.
The operator investigates via the emergency SSH session.
The operator may retry the failed stage or perform manual recovery actions.
If manual recovery is performed, document all actions taken for the postmortem.
Do not proceed to the next recovery stage until the current stage is confirmed successful.

Exceptions

None. The dual-approval requirement cannot be bypassed. If only one operator is available, the system remains in lockdown until a second operator is available. This is a deliberate security control.

Purpose​

Scope​

Staged Recovery​

Recovery Stages​

Recovery Workflow​

Dual-Approval Requirement​

Submitting Recovery Requests​

Via API​

Via Dashboard (Glance)​

Post-Recovery Monitoring​

Post-Recovery Checklist​

Recovery Failure​

Exceptions​