Rolling back to a previous deploy
Rollback in Ployz restores the cluster to a previous deploy point: services return to their prior revision, and persistent volumes are reverted to the ZFS snapshot taken at that deploy’s commit. Because ZFS snapshot revert is an in-place atomic operation, rollback does not need to re-transfer data — it is instant regardless of volume size.
Rollback is itself a deploy. It goes through the same plan → apply → commit phases, acquires the namespace deploy lock, and produces a new durable commit that records the rollback as an explicit fact. The cluster does not silently rewind — the history of what happened remains intact.
Why rollback is safe
Section titled “Why rollback is safe”Ployz takes a ZFS snapshot at the commit point of every deploy. That snapshot captures the exact on-disk state at the moment traffic switched to the new version. Rolling back replaces the live dataset with the snapshot, returning the volume to the state it was in when that deploy became live.
Because the snapshot is taken atomically at the commit boundary — after the new containers started and before old instances were removed — the snapshot always corresponds to a known-good deployment state.
The relationship between rollback and the deploy commit model
Section titled “The relationship between rollback and the deploy commit model”Every deploy commits routing and volume ownership facts as an immutable record. Rollback works against that record: you identify the deploy commit you want to restore, and Ployz uses it to reconstruct the prior service spec, placement, and volume state.
This means:
- You can roll back to any committed deploy, not just the most recent one.
- You cannot roll back past a volume move — a moved volume’s prior location is no longer authoritative after the move commit.
- Cleanup failures after a commit (
FailedAfterCheckpointstatus) do not prevent rollback. The commit point itself is always valid.
The FailedAfterCheckpoint status
Section titled “The FailedAfterCheckpoint status”If a deploy’s cleanup phase fails after the final commit, Ployz records the status as FailedAfterCheckpoint. This status means:
- The new version is live. Traffic is already routing to it.
- Some old instances may not have been cleaned up.
- The deploy commit is durable and rollback-eligible.
FailedAfterCheckpoint is not a failed deploy from a traffic perspective. It is an operational signal that cleanup needs attention, but it does not erase the fact that the new version is running.
Identifying the deploy to roll back to
Section titled “Identifying the deploy to roll back to”Use the deploy history to find the commit ID for the version you want to restore. Commits are listed in chronological order with their deploy ID, namespace, and status:
# List recent deploy commits for a namespace (JSON output for scripting)ployzctl deploy preview -f current.toml --jsonPerforming a rollback
Section titled “Performing a rollback”Rollback is expressed as a deploy against the prior manifest. The simplest path is to re-deploy the previous manifest version:
Identify the prior manifest
Section titled “Identify the prior manifest”Retrieve the manifest that produced the deploy you want to restore. If you store manifests in version control, check out the version corresponding to the target deploy commit.
git show HEAD~1:deploy.toml > rollback.tomlPreview the rollback deploy
Section titled “Preview the rollback deploy”Verify the rollback plan before applying. The preview will show which services and volumes will revert.
ployzctl deploy preview -f rollback.tomlConfirm that the participating machines are reachable and that the target volumes have a valid snapshot to restore from.
Apply the rollback
Section titled “Apply the rollback”Run the deploy. Ployz will stop the current instances, revert the ZFS snapshots, start the prior-revision containers, and commit the rollback as a new deploy fact.
ployzctl deploy -f rollback.tomlThe command blocks until the rollback completes or fails. Traffic switches back at the commit point.
Verify the result
Section titled “Verify the result”Confirm that the prior service revision is running and that routing is correct.
ployzctl machine lsployzctl deploy preview -f rollback.tomlA clean preview showing no changes confirms that the cluster matches the rollback manifest.
Atomicity guarantees
Section titled “Atomicity guarantees”Rollback follows the same commit model as any other deploy:
- Before the rollback commit, failure leaves the cluster in its current state. Nothing has been reverted.
- At the rollback commit, routing flips to the prior revision and volume ownership transfers atomically.
- After the rollback commit, the prior version is live. Cleanup of the reverted instances follows the same cleanup-failure rules as any other deploy.
There is no intermediate state where some services are on the old version and others are on the new version within the same phase. The commit is all-or-nothing for the facts it contains.
Next steps
Section titled “Next steps”Understand the commit boundary that makes rollback safe and predictable.
Use branching to test changes in isolation before promoting, reducing the need for rollback.