Prompts are code you forgot to version
A prompt change is a deploy. Act like it.
I have seen a single comma “fix” conversion and silently break compliance in the same afternoon. No diff. No owner. No rollback story. We would never ship a payment service that way. Somehow prompts live in Notion pages with “final_v7_REALLY_FINAL” in the title.
PromptOps is the unglamorous practice of making prompts testable:
- Check them into git (or a registry with semver).
- Run evals on every change—golden sets, red-team cases, regression on the metrics you already report upstairs.
- Stage rollouts: shadow mode, 5% traffic, full promote—same as any other risky change.
- Log inputs and outputs with retention policy, not with panic after legal asks.
The objection I hear is velocity. “We need to iterate fast.” You do. That is why you automate evals, not skip them. Fast iteration without measurement is just fast guessing.
A useful rule: if two engineers cannot reproduce the same output from the same input, you do not have a prompt problem. You have a systems problem. Temperature, tool schemas, retrieval chunks, and model routing all belong in the reproducibility boundary.
The lowkey genius move is to make prompts boring. Name them. Review them. Retire them. When leadership asks what changed last quarter, you show a changelog—not a shrug.
Your model will keep improving. Your prompts will keep drifting. Only process compounds.