The stack runs a goal-directed perceive → reason → act loop that turns a
natural-language goal into robot actions using a pluggable world model (your
VLA / policy). It lives under cadenza.stack.
The world model never touches motors. It reads an observation and proposes
named actions from the action vocabulary. The stack validates,
times, and executes them, then feeds the next observation back.
Run with a world model
import cadenza_lab as cadenza
result = cadenza.stack.run(
robot="go1",
goal="walk to the marker and sit",
target=(-3.0, 0.0), # optional (x, y), exposed as obs['target_xy']
world_model=MyVLA, # adapter class, instance, name, or handle
headless=True,
render_camera=False,
max_iterations=20,
)
print(result.done, result.total_actions)
stack.run(robot="go1", goal="", *, target=None, world_model=None,
modalities=None, root=".", max_iterations=250, headless=False,
render_camera=True, xml_path=None, verbose=True) -> StackResult
| Param | Description |
|---|
goal | Natural-language objective, passed to the model each tick. |
target | Optional (x, y) location. Surfaces as observation['target_xy']. |
world_model | Adapter class, instance, registered name, or WorldModelHandle. None = auto-detect. |
modalities | List of Modality instances/classes/names (see below). |
xml_path | Scene XML to load. Compile a Scene to get one. |
max_iterations | Max reasoning ticks. |
headless / render_camera | Viewer + camera control. |
StackResult carries .done, .total_actions, .executed, .notes, and
.final_observation.
Choosing the model
| You have… | Pass |
|---|
| An adapter class | world_model=MyVLA |
| A built adapter instance | world_model=MyVLA(model=policy) |
| A model on disk / HF cache | leave world_model=None (auto-detect at root) |
| A registered name | register_world_model(...) then world_model=None |
register_world_model(adapter, checkpoint=None, model=None, **metadata) -> WorldModelHandle
register_world_model pins the auto-detected model. After calling it, run with
world_model=None. To address an adapter by name string, register the class
with cadenza.stack.adapters.base.register_adapter first.
Implement a world model
Subclass WorldModelAdapter and implement propose_actions.
from cadenza_lab import WorldModelAdapter, AdapterReply, ProposedAction
from pathlib import Path
class MyVLA(WorldModelAdapter):
name = "my-vla"
description = "Example heuristic policy."
@classmethod
def detect(cls, root: Path):
return None # never auto-detected, pass it explicitly
def _load_impl(self):
self.model = ... # load weights here (called by load())
def propose_actions(self, observation, goal, vocabulary, history=None):
# observation is a dict, propose named actions from `vocabulary`
if observation.get("body_height", 0) and is_arrived(observation):
return AdapterReply(actions=[ProposedAction("sit")], done=True)
return AdapterReply(
actions=[ProposedAction("walk_forward", {"distance_m": 1.0})],
note="advancing",
)
propose_actions is called with keyword arguments. The first parameter must
be named exactly observation (then goal, vocabulary, history), and it
must return an AdapterReply, not a bare list. Set done=True to end the
loop.
| Member | Purpose |
|---|
name, description | Identify the adapter. |
propose_actions(observation, goal, vocabulary, history=None) | Return an AdapterReply of ProposedActions. |
detect(root) | Classmethod auto-detection hook (filesystem only). Return a path or None. |
load() / _load_impl() | load() is idempotent and calls your _load_impl(). |
is_loaded | Whether the model is ready. |
In ProposedAction(name, params={}, rationale=""), name must be in the
vocabulary and params accepts distance_m, rotation_rad, speed, and so on.
Wrap them in AdapterReply(actions=[], done=False, note="").
Modalities
A Modality computes extra observation keys each tick (depth, vision, distance to
target, …). It’s how the model “sees” more than raw proprioception.
from cadenza_lab import Modality, ModalityResult
import math
class Proximity(Modality):
name = "proximity"
def compute(self, observation) -> ModalityResult:
tx, ty = (-3.0, 0.0)
d = math.hypot(tx - observation.pos[0], ty - observation.pos[1])
return ModalityResult(keys={"target_dist": d}, summary=f"target_dist={d:.2f}m")
The keys merge into the dict your adapter receives. summary is printed each tick
when verbose=True. compute takes an Observation.
cadenza.stack.list_modalities() # registered Modality types
cadenza.stack.get_modality("vision")
cadenza.stack.register_modality(MyModality)
Pass instances/classes/names to run(..., modalities=[Proximity()]).
Demo: drive a goal headless
This runs with no model on disk and no display. It is a self-contained
heuristic adapter plus a modality, the exact pattern used in the
full project.
"""demo_stack.py: reach a target and sit, fully headless (CI-safe)."""
import math
from pathlib import Path
import cadenza_lab as cadenza
from cadenza_lab import (WorldModelAdapter, AdapterReply, ProposedAction,
Modality, ModalityResult)
TARGET = (-3.0, 0.0)
class Proximity(Modality):
name = "proximity"
def compute(self, observation) -> ModalityResult:
d = math.hypot(TARGET[0] - observation.pos[0], TARGET[1] - observation.pos[1])
return ModalityResult(keys={"target_dist": d}, summary=f"d={d:.2f}m")
class HeadingPolicy(WorldModelAdapter):
name = "heading"
@classmethod
def detect(cls, root: Path): return None
def propose_actions(self, observation, goal, vocabulary, history=None):
d = observation.get("target_dist", 9.0)
pos, yaw = observation["pos"], observation["rpy"][2]
if d < 0.6:
return AdapterReply(actions=[ProposedAction("sit")], done=True, note="arrived")
desired = math.atan2(TARGET[1] - pos[1], TARGET[0] - pos[0])
err = (desired - (yaw + math.pi) + math.pi) % (2 * math.pi) - math.pi
if abs(err) > 0.4:
name = "turn_left" if err > 0 else "turn_right"
return AdapterReply(actions=[ProposedAction(name, {"rotation_rad": min(abs(err), 0.8)})])
return AdapterReply(actions=[ProposedAction("walk_forward", {"distance_m": min(d, 1.0)})])
result = cadenza.stack.run(
robot="go1", goal="reach the target and sit", target=TARGET,
world_model=HeadingPolicy, modalities=[Proximity()],
headless=True, render_camera=False, max_iterations=15, verbose=True,
)
print("done:", result.done, "actions:", result.total_actions,
"final pos:", result.final_observation.pos.round(2))
python demo_stack.py
# ...
# done: True actions: 4 final pos: [-2.78 0.22 0.25]