Skip to main content

BRB Recovery Procedures

Owner: Anchor MSP Operations Lead Last reviewed: 2026-05-24

Purpose

Define the staged recovery process for systems that have been locked down by the BRB Protocol. Recovery is deliberate, staged, and requires dual-operator approval to prevent premature restoration of a compromised system.

Scope

All systems under Anchor managed production that are currently in a BRB lockdown state.

Staged Recovery

BRB recovery is performed in stages. Each stage restores a specific set of capabilities while allowing operators to verify system integrity before proceeding to the next stage. Stages must be executed in order.

Recovery Stages

StageWhat Is RestoredWhat Remains LockedVerification Before Proceeding
1. NetworkOutbound and inbound network access restored. Firewall rules reverted to pre-lockdown state.Services remain stopped. User accounts remain locked.Verify no malicious outbound connections. Check DNS resolution. Confirm monitoring server can reach the host.
2. ServicesAll configured services started (Docker, database, web server, application).User accounts remain locked (except emergency user).Verify services start cleanly. Check application health endpoints. Review service logs for errors. Confirm no unexpected processes.
3. User AccountsAll user accounts unlocked. Full system access restored.Nothing. System is fully recovered.Verify user logins work. Confirm no unauthorized accounts were created during the incident. Review account list against the access control register.

Recovery Workflow

Lockdown State


Operator A submits recovery request (Stage 1: Network)


Operator B approves recovery request (different user required)


Agent restores network access


Operators verify network integrity


Operator A submits recovery request (Stage 2: Services)


Operator B approves recovery request


Agent starts services


Operators verify service health


Operator A submits recovery request (Stage 3: User Accounts)


Operator B approves recovery request


Agent unlocks user accounts


Operators verify full system health


System fully recovered

Dual-Approval Requirement

  1. Every recovery stage requires approval from two different operators. The same operator cannot submit and approve the same recovery request.
  2. This requirement is enforced by the BRB controller. Duplicate approvals from the same operator are rejected.
  3. The dual-approval requirement exists to prevent a scenario where an attacker who has compromised one operator's credentials can unlock a contained system.
  4. Both operators must authenticate to the BRB controller before submitting or approving a recovery request.

Submitting Recovery Requests

Via API

# Operator A submits the recovery request
curl -X POST https://brb-controller.anchor.internal/api/v1/recover \
-H "Authorization: Bearer $OPERATOR_A_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"system_id": "SYSTEM_ID",
"stage": "network",
"justification": "Forensic review complete. No active threat indicators found."
}'

# Operator B approves the recovery request
curl -X POST https://brb-controller.anchor.internal/api/v1/recover/approve \
-H "Authorization: Bearer $OPERATOR_B_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"system_id": "SYSTEM_ID",
"stage": "network",
"approval_note": "Confirmed forensic review. Network recovery approved."
}'

Via Dashboard (Glance)

  1. Navigate to the locked-down system in Glance.
  2. Select the recovery stage to initiate.
  3. Enter the justification and submit.
  4. The second operator logs in separately and approves the pending recovery request.

Post-Recovery Monitoring

After full recovery (all three stages complete), the system enters a post-recovery monitoring period of 48 hours.

During this period:

  1. Alert thresholds are lowered temporarily to catch subtle indicators of recurrence. Thresholds are restored to normal after the 48-hour period.
  2. Security monitoring is heightened. Wazuh file integrity monitoring runs on an accelerated schedule. CrowdSec block lists are verified current.
  3. Log review. Operators review application and system logs twice daily (morning and evening) for the 48-hour period.
  4. Operator availability. At least one operator must be reachable within 15 minutes during the post-recovery monitoring period.

Post-Recovery Checklist

CheckDescriptionTiming
Application healthAll health endpoints returning 200Immediately after recovery
Monitoring coverageAll Prometheus targets up, Uptime Kuma checks greenWithin 1 hour
Log flowPromtail shipping logs to Loki, no gapsWithin 1 hour
Security agentsWazuh and CrowdSec running and reportingWithin 1 hour
Backup verificationBackup job runs successfully after recoveryNext scheduled backup
BRB agent healthBRB agent health check passingImmediately after recovery
Client notificationClient informed of recovery and current statusWithin 2 hours
Postmortem scheduledPostmortem meeting scheduled within 5 business daysWithin 24 hours

Recovery Failure

If a recovery stage fails (e.g., services fail to start, network rules fail to revert):

  1. The agent reports the failure to the controller.
  2. The controller alerts #anchor-incidents-critical.
  3. The operator investigates via the emergency SSH session.
  4. The operator may retry the failed stage or perform manual recovery actions.
  5. If manual recovery is performed, document all actions taken for the postmortem.
  6. Do not proceed to the next recovery stage until the current stage is confirmed successful.

Exceptions

None. The dual-approval requirement cannot be bypassed. If only one operator is available, the system remains in lockdown until a second operator is available. This is a deliberate security control.