New Managed System Onboarding Checklist
Owner: Anchor MSP Operations Lead Last reviewed: 2026-04-04
Purpose
Step-by-step operational checklist for Anchor operators when bringing a new system into managed production. Use this after the development team has submitted a handoff request.
Checklist
Complete these steps in order:
- 1. Receive handoff request. Development team (EGI or Mast) submits a request to hand off a system to Anchor.
- 2. Validate acceptance criteria. Review the system against the Managed Production Acceptance Criteria. If any criteria are not met, return a gap list to the development team. Do not proceed until all criteria are satisfied.
- 3. Provision uptime monitoring. Add the system's health check endpoint to Uptime Kuma. Confirm checks are returning healthy.
- 4. Configure metrics scraping. Add the system's Prometheus metrics endpoint to the scrape configuration. Verify metrics are appearing in Prometheus.
- 5. Set up dashboards. Create a Grafana dashboard for the system covering key metrics: uptime, response time, error rate, resource usage.
- 6. Configure log aggregation. Ensure the system's stdout logs are being collected by Loki. Verify logs appear in Grafana Explore.
- 7. Set up alerting rules. Create Alertmanager rules for the system. At minimum: health check failure (critical), high error rate (high), resource threshold breach (medium).
- 8. Configure SMS escalation. Verify that critical alerts for this system trigger Twilio SMS delivery to the on-call operator.
- 9. Configure backups. Set up Restic backup jobs for the system's identified data stores. Run the first backup manually and confirm completion. Add backup success/failure metrics to Prometheus.
- 10. Onboard secrets to Vault. Migrate all secrets listed in the acceptance criteria to Vault. Configure the application to read from Vault at runtime. Verify the application starts and functions correctly with Vault-sourced secrets.
- 11. Register in PostHog. Set up PostHog tracking for the system if applicable. Confirm events are flowing.
- 12. Add to CrowdSec protection. Register the system's public-facing endpoints with CrowdSec. Verify the bouncer is active.
- 13. Configure Wazuh monitoring. Enroll the system's host(s) in Wazuh. Confirm the agent is reporting and file integrity monitoring is active.
- 14. Create system runbook. Write a runbook for this system using the Runbook Template. Include architecture, dependencies, common operations, troubleshooting steps, and escalation contacts.
- 15. Complete handoff acceptance. Walk through the Handoff Acceptance Checklist with the development team. Get sign-off from both sides.
After Completion
The system is now under Anchor management. All future production operations follow Anchor SOPs.