Every learned artifact in Cadenza — a LoRA adapter, a residual policy, a distilled
student, a GRD-adapted VLA — passes through the same governance gate before it
can be promoted. Governance is what turns “the loss went down” into a deployment
decision you can trust.
The verdict
Each governed command measures the artifact locally, sends the raw metrics to the
Cadenza API, and the API returns
one of three verdicts:
| Verdict | Meaning | What happens |
|---|
DEPLOY | Passes the gate. | Promoted as the new baseline (with --promote / --gate). |
BLOCK | Fails a safety or regression check. | Rolled back to the previous baseline, never promoted. |
NEEDS_DATA | Insufficient coverage to decide. | Kept but not promoted — collect more examples and re-run. |
Verdicts are computed server-side. The client only measures — it never
decides. That’s why the governed commands (lora eval, lora finetune --gate,
residual train/eval/bench, distill eval, vla grd/eval) require
sign-in: the API needs to authenticate the run and
attribute the baseline to your account.
What gets scored
The scorecard dimensions depend on the artifact, but the shape is the same — a
mix of fidelity, safety, coverage, stability, and regression:
| Stage | Scored on |
|---|
LoRA adapter (env lora) | fidelity · safety · coverage · stability · regression |
Residual policy (env residual) | success · collision · residual-sanity · regression |
Distilled student (env distill) | teacher↔student gap · success · regression |
GRD / VLA adapter (env vla) | governed by the λ / RL-budget / change-cap loop |
Governance is stateful. Each project keeps a promoted baseline per artifact
kind. When a new artifact earns DEPLOY, it replaces the baseline and the old one
is snapshotted. When a candidate earns BLOCK, the gate restores the previous
baseline automatically — so a bad run can never leave you worse off than before.
--gate (on env lora finetune) runs the scorecard inline and promotes or
rolls back automatically as part of the training command.
--promote (on the eval commands) deploys the candidate only if it earns
DEPLOY; a BLOCK is refused and rolled back.
Steering the next round
For the closed-loop stages (env residual train, env vla grd), the API does
more than judge — it steers. Each round it returns the hyperparameters and
dials for the next round (the residual’s PPO schedule; GRD’s λ / RL-budget /
change-cap), so the loop converges toward a deployable artifact under the gate
rather than just optimizing a raw objective.
You don’t need to read or tune the schedule — the client surfaces opaque
per-round progress and the final verdict. Governance is the product: a verdict you
can act on, not a curve you have to interpret.