VLA mode & GRD

VLA-only mode is the path for adapting a vision-language-action model without the full Cadenza mission stack. A VLA project is just two things in a vla.json: a model and an action library. The GRD loop then adapts only the model’s LoRA adapter — the VLA itself stays frozen — in a single governed pass that mixes imitation and RL.

The GRD loop needs the lora extra (pip install -e ".[lora]"). Loading a real downloaded model (smolvla / hf / local) additionally needs the vla extra (pip install -e ".[vla]", which adds transformers + lerobot). The governed commands require sign-in.

`env vla init`: scaffold a model + action library

cadenza env vla init reach-arm --robot franka --model default

Flag	Default	Purpose
`--robot <name>`	`go1`	Robot whose action library to attach.
`--model default\|smolvla\|hf\|local`	`default`	Which backbone to adapt. `default` uses the built-in stable encoder; the others load a real VLA.
`--model-id <repo\|path>`	—	HF repo id (`hf`) or local path (`local`/`smolvla`).
`--force`	off	Overwrite an existing `vla.json`.

Writes reach-arm/vla.json. Inspect it any time with env vla show reach-arm.

`env vla data`: author goal→action pairs

GRD’s imitation signal comes from goal→action examples you author against the model’s action library.

# add one pair
cadenza env vla data reach-arm add "reach the red block" \
  --steps 'move_forward 1.0, lower_arm 0.5, close_gripper'

# list the whole dataset
cadenza env vla data reach-arm

`env vla grd`: the governed GRD loop

GRD = Govern · Refine · Decide. In one loop it fine-tunes the LoRA adapter on your goal→action data (imitation) and RL-tunes it, then asks the Cadenza API for a verdict. The API steers the loop — it sets three dials and returns DEPLOY | BLOCK | NEEDS_DATA:

Dial	What it controls
`--lambda <l>`	The imitation ↔ RL mix.
`--rl-budget <n>`	How much RL the round is allowed.
`--change-cap <c>`	How far the adapter may move from the frozen base in one round.

cadenza env vla grd reach-arm --rounds 4

Flag	Default	Purpose
`--lambda <l>`	API-steered	Imitation/RL mix (the API overrides per round).
`--rl-budget <n>`	API-steered	RL budget per round.
`--change-cap <c>`	API-steered	Per-round adapter change cap.
`--rounds <r>`	—	Max GRD rounds before stopping.
`--device <d>`	auto	Training device.

Only the LoRA adapter is trained; the VLA stays frozen throughout.

`env vla eval`: one-shot governance

Score the current adapter without running a full loop:

cadenza env vla eval reach-arm --promote

Returns the same DEPLOY | BLOCK | NEEDS_DATA verdict (see Governance); --promote deploys the adapter if it passes.

Full loop

cadenza env vla init reach-arm --robot franka --model default
cadenza env vla data reach-arm add "reach the red block" \
  --steps 'move_forward 1.0, lower_arm 0.5, close_gripper'
cadenza env vla grd reach-arm --rounds 4
cadenza env vla eval reach-arm --promote

VLA mode and the LoRA action head both fine-tune a LoRA adapter, but they are different entry points: env lora adapts the action head inside a full env.json mission, while env vla adapts a standalone model with no mission spec — just a model and an action library.

Introduction

Configure

Using the CLI

Megan

Reference

`env vla init`: scaffold a model + action library

`env vla data`: author goal→action pairs

`env vla grd`: the governed GRD loop

`env vla eval`: one-shot governance

Full loop

​env vla init: scaffold a model + action library

​env vla data: author goal→action pairs

​env vla grd: the governed GRD loop

​env vla eval: one-shot governance

​Full loop

`env vla init`: scaffold a model + action library

`env vla data`: author goal→action pairs

`env vla grd`: the governed GRD loop

`env vla eval`: one-shot governance

Full loop