Skip to main content
The residual stack is Cadenza’s answer to a hard deployment problem: you have a capable but frozen base policy (or VLA), the real robot’s dynamics don’t quite match what it was trained on, and you can’t afford to re-train the whole thing on-device. Instead you learn a small residual that nudges the base’s actions, govern it, and then distill the base+residual pair into a compact student that runs onboard without the base at all.
Residual RL and distillation need the rl extra: pip install -e ".[rl]" (installs torch). Without cadenza-lab they run on a deterministic proxy task (labelled as such); --real targets the gated cadenza-lab sim seam. The governed commands (train, eval, bench) require sign-in.

The control law

The residual never replaces the base — it corrects it:
a = clamp(a_base + α · gate · Δa)
The frozen base/VLA picks a_base; the small residual head emits Δa, scaled by α (action scale) and an optional gate. The base never receives a gradient.

env residual init: scaffold + profile

Establishes the residual architecture and dry-run profiles the head — params, per-step latency, peak memory — before you commit to a training run. No training happens here.
cadenza env residual init rescue-dog --alpha 0.15 --hidden-dim 256
FlagDefaultPurpose
--hidden-dim <h>256Residual MLP hidden width.
--alpha <a>0.15Action scale — how hard the residual is allowed to nudge.
--obs-dim <n>base-derivedObservation width (auto-probed from the project’s encoder when available).
--rate-limit <r>0.5Per-step change limit on the residual output.
--no-gateoffDrop the gate term (a = clamp(a_base + α·Δa)).
--device cuda|mps|cpuautoProfiling device (auto-selects cuda → mps → cpu).
Writes rescue-dog/residual/residual.json.

env residual train: governed PPO on the frozen base

PPO-trains the residual head against the frozen base under a perturbation curriculum (sparse reward). The Cadenza API governs the run: it picks the hyperparameters, decides when to stop, and returns the verdict — the client only collects rollouts and runs the raw PPO step each round.
cadenza env residual train rescue-dog
On DEPLOY the policy is saved and promoted as the residual baseline; on BLOCK it rolls back to the previous residual. The trained head lands at rescue-dog/residual/residual_policy.pt.

env residual eval: govern the residual

Re-scores the trained residual on its own — success / collision / residual-sanity / regression — and returns DEPLOY | BLOCK | NEEDS_DATA. --promote sets it as the baseline if it passes. See Governance.
cadenza env residual eval rescue-dog --promote

env residual bench: residual vs full RL

A head-to-head benchmark of the current full-RL stack against cadenza-cli’s residual RL — reporting compute, dollar cost, and accuracy, then a verdict on whether the residual arm wins.
cadenza env residual bench rescue-dog --steps 20000 --cost-per-gpu-hour 2.0
FlagDefaultPurpose
--steps <n>configEnv steps per arm.
--cost-per-gpu-hour <x>2.0Dollar rate used to price each arm.
--device <d>autoTraining device.
--realoffTarget the cadenza-lab sim instead of the proxy.

Distillation: a base-free onboard student

Once a residual is deployed, env distill collects teacher (base + residual) rollouts and trains a compact student that reproduces the teacher’s behaviour without loading the base — so it runs on CPU/MPS at the control-loop rate, optionally quantized to int8.
cadenza env distill rescue-dog --epochs 60 --quantize
FlagDefaultPurpose
--epochs <n>60Distillation epochs.
--quantizeoffExport an int8 student for tighter onboard latency.
--device <d>autoTraining device.
The report prints the teacher→student success gap, student param count, and the onboard latency against the 50 Hz control budget (it tells you whether the student meets or MISSES the floor). Artifacts land in rescue-dog/student/.

Govern the student

cadenza env distill eval rescue-dog --promote
Scores the student on gap / success / regression and returns the same DEPLOY | BLOCK | NEEDS_DATA verdict, with rollback on BLOCK.

Full loop

# 1. scaffold + profile the residual head
cadenza env residual init rescue-dog --alpha 0.15

# 2. governed PPO on the frozen base (DEPLOY → promoted)
cadenza env residual train rescue-dog

# 3. prove it beats full RL on compute/cost/accuracy
cadenza env residual bench rescue-dog

# 4. distill into a base-free, int8 onboard student
cadenza env distill rescue-dog --quantize
cadenza env distill eval rescue-dog --promote
Every torch path here runs through Cadenza’s transparent acceleration layer — bf16, thread tuning, matmul precision, and torch.compile are applied automatically per hardware target.