Training & fine-tuning

Cadenza gives you three complementary ways to improve a project after you’ve run it.

`env finetune`: export VLA training data

Convert a run log into (prompt, action, reward) records for your own vision-language-action SFT or offline-RL pipeline.

cadenza env finetune rescue-dog .cadenza-env/<run-id>.log.jsonl -o train.jsonl

Arg / flag	Description
`<project>`	Project directory.
`<log>`	A `.log.jsonl` produced by `env run`.
`-o <file>`	Output path for the rendered records.

Prompts are rendered with the project’s vla_finetune.prompt_template (see the schema).

`env train`: rewrite the system prompt

Runs a Groq LLM-as-Judge over the project’s cached runs and rewrites the project’s SYSTEM_PROMPT to fix the failure modes it finds.

export GROQ_API_KEY="gsk_..."
cadenza env train rescue-dog

Requires a GROQ_API_KEY (Configuration). No key, no training.

`env lora`: fine-tune and govern the action head

Fine-tunes the cadenza-lab LoRA action head for a project (on the project’s own base/VLA if it ships a lora_encoder.py), then governs it with a scorecard. Once trained, drive a mission with it via env run --policy lora.

Requires the lora extra: pip install -e ".[lora]" (installs torch).

Subcommand	What it does
`env lora add <project> "<goal>" --steps '<...>' [--image PATH]`	Add a goal→action training example (optionally with visual context).
`env lora data <project> [--finetune PATH]`	Show the current training dataset.
`env lora finetune <project> [--epochs N] [--lr LR] [--rank R] [--gate]`	Generate goal→action data and fine-tune the adapter. `--gate` runs the governance scorecard and promotes or rolls back automatically.
`env lora eval <project> [--promote]`	Run the governance scorecard on the trained adapter. `--promote` deploys it if it passes.
`env lora decode <project> "<goal>"`	Decode a goal into actions using the trained adapter.

Example

cadenza env lora add rescue-dog "enter the debris field" \
  --steps 'walk_forward 1.0, crawl_forward 0.5'
cadenza env lora finetune rescue-dog --epochs 5 --rank 8 --gate
cadenza env lora decode rescue-dog "search for the victim"
cadenza env run rescue-dog --headless --policy lora

Governance scorecard

env lora eval (and finetune --gate) score the adapter on fidelity, safety, coverage, stability, and regression, producing a verdict with next-step guidance:

Verdict	Meaning
`DEPLOY`	Passes the gate. Safe to promote.
`BLOCK`	Fails a safety/regression check. Rolled back, not promoted.
`NEEDS_DATA`	Insufficient coverage. Collect more examples (`env lora add`).

The verdict is computed server-side by the Cadenza API, which is why env lora eval/finetune --gate require sign-in. The full gate model — shared by residual, distillation, and VLA — is in Governance & scorecards.

Beyond the action head

LoRA fine-tuning adapts the action head inside a mission. Two related stages go further:

Residual RL & distillation

Learn a tiny residual on the frozen base, then distill it into a base-free student that runs onboard.

VLA mode & GRD

Adapt a standalone VLA’s LoRA adapter with one governed fine-tune + RL loop — no env.json, no prompts.

Introduction

Configure

Using the CLI

Megan

Reference

`env finetune`: export VLA training data

`env train`: rewrite the system prompt

`env lora`: fine-tune and govern the action head

Example

Governance scorecard

Beyond the action head

Residual RL & distillation

VLA mode & GRD

​env finetune: export VLA training data

​env train: rewrite the system prompt

​env lora: fine-tune and govern the action head

​Example

​Governance scorecard

​Beyond the action head

Residual RL & distillation

VLA mode & GRD

`env finetune`: export VLA training data

`env train`: rewrite the system prompt

`env lora`: fine-tune and govern the action head

Example

Governance scorecard

Beyond the action head