Skip to main content

Production Change Policy

Owner: Anchor MSP Operations Lead Last reviewed: 2026-04-04

Purpose

Define how changes to production systems are proposed, approved, executed, and recorded.

Scope

All changes to production infrastructure, configuration, monitoring, alerting, backups, secrets, DNS, and security settings for systems under Anchor management. Application deploys initiated by development teams are not covered by this policy (but Anchor can halt them per the Production Ownership Policy).

Change Types

Standard Changes

Pre-approved, low-risk, routine changes. Examples: adding a new Grafana dashboard, updating an alert threshold, adding a new Uptime Kuma check.

  • Documented before execution.
  • Executed by any Anchor operator.
  • Logged after completion.
  • No approval step required.

Normal Changes

Changes with meaningful impact that require review. Examples: modifying Alertmanager routing, changing backup schedules, updating Cloudflare WAF rules, rotating Vault credentials.

  1. Submitted: Operator describes the change, reason, and rollback plan.
  2. Reviewed: A second Anchor operator reviews the change.
  3. Approved: Operations Lead approves (or the reviewer, for routine normal changes).
  4. Scheduled: Change is assigned a time window.
  5. Executed: Change is made.
  6. Verified: Operator confirms the change works as expected. Monitoring is checked.

Emergency Changes

Changes required immediately to restore service or prevent data loss. Examples: DNS failover during outage, emergency secret rotation after a leak, blocking a malicious IP.

  • Executed immediately by the on-call operator.
  • Documented retroactively within 24 hours.
  • Reviewed by the Operations Lead after the fact.

Change Records

Every change record includes:

  • Who: Operator who made the change.
  • What: Specific change made.
  • When: Date and time of execution.
  • Why: Reason for the change.
  • Rollback plan: How to undo the change if it causes problems.

Failed Changes

If a change causes unexpected problems:

  1. Execute the rollback plan immediately.
  2. Alert the team in Slack.
  3. Treat the failed change as an incident — investigate root cause and document findings.

Exceptions

No changes bypass this policy. If a change cannot wait for the normal process, use the emergency change process. Emergency changes still get documented.