Production Change Policy
Owner: Anchor MSP Operations Lead Last reviewed: 2026-04-04
Purpose
Define how changes to production systems are proposed, approved, executed, and recorded.
Scope
All changes to production infrastructure, configuration, monitoring, alerting, backups, secrets, DNS, and security settings for systems under Anchor management. Application deploys initiated by development teams are not covered by this policy (but Anchor can halt them per the Production Ownership Policy).
Change Types
Standard Changes
Pre-approved, low-risk, routine changes. Examples: adding a new Grafana dashboard, updating an alert threshold, adding a new Uptime Kuma check.
- Documented before execution.
- Executed by any Anchor operator.
- Logged after completion.
- No approval step required.
Normal Changes
Changes with meaningful impact that require review. Examples: modifying Alertmanager routing, changing backup schedules, updating Cloudflare WAF rules, rotating Vault credentials.
- Submitted: Operator describes the change, reason, and rollback plan.
- Reviewed: A second Anchor operator reviews the change.
- Approved: Operations Lead approves (or the reviewer, for routine normal changes).
- Scheduled: Change is assigned a time window.
- Executed: Change is made.
- Verified: Operator confirms the change works as expected. Monitoring is checked.
Emergency Changes
Changes required immediately to restore service or prevent data loss. Examples: DNS failover during outage, emergency secret rotation after a leak, blocking a malicious IP.
- Executed immediately by the on-call operator.
- Documented retroactively within 24 hours.
- Reviewed by the Operations Lead after the fact.
Change Records
Every change record includes:
- Who: Operator who made the change.
- What: Specific change made.
- When: Date and time of execution.
- Why: Reason for the change.
- Rollback plan: How to undo the change if it causes problems.
Failed Changes
If a change causes unexpected problems:
- Execute the rollback plan immediately.
- Alert the team in Slack.
- Treat the failed change as an incident — investigate root cause and document findings.
Exceptions
No changes bypass this policy. If a change cannot wait for the normal process, use the emergency change process. Emergency changes still get documented.