Skip to main content
The stack runs a goal-directed perceive → reason → act loop that turns a natural-language goal into robot actions using a pluggable world model (your VLA / policy). It lives under cadenza.stack. The world model never touches motors. It reads an observation and proposes named actions from the action vocabulary. The stack validates, times, and executes them, then feeds the next observation back.

Run with a world model

import cadenza_lab as cadenza

result = cadenza.stack.run(
    robot="go1",
    goal="walk to the marker and sit",
    target=(-3.0, 0.0),         # optional (x, y), exposed as obs['target_xy']
    world_model=MyVLA,          # adapter class, instance, name, or handle
    headless=True,
    render_camera=False,
    max_iterations=20,
)
print(result.done, result.total_actions)
stack.run(robot="go1", goal="", *, target=None, world_model=None,
          modalities=None, root=".", max_iterations=250, headless=False,
          render_camera=True, xml_path=None, verbose=True) -> StackResult
ParamDescription
goalNatural-language objective, passed to the model each tick.
targetOptional (x, y) location. Surfaces as observation['target_xy'].
world_modelAdapter class, instance, registered name, or WorldModelHandle. None = auto-detect.
modalitiesList of Modality instances/classes/names (see below).
xml_pathScene XML to load. Compile a Scene to get one.
max_iterationsMax reasoning ticks.
headless / render_cameraViewer + camera control.
StackResult carries .done, .total_actions, .executed, .notes, and .final_observation.

Choosing the model

You have…Pass
An adapter classworld_model=MyVLA
A built adapter instanceworld_model=MyVLA(model=policy)
A model on disk / HF cacheleave world_model=None (auto-detect at root)
A registered nameregister_world_model(...) then world_model=None
register_world_model(adapter, checkpoint=None, model=None, **metadata) -> WorldModelHandle
register_world_model pins the auto-detected model. After calling it, run with world_model=None. To address an adapter by name string, register the class with cadenza.stack.adapters.base.register_adapter first.

Implement a world model

Subclass WorldModelAdapter and implement propose_actions.
from cadenza_lab import WorldModelAdapter, AdapterReply, ProposedAction
from pathlib import Path

class MyVLA(WorldModelAdapter):
    name = "my-vla"
    description = "Example heuristic policy."

    @classmethod
    def detect(cls, root: Path):
        return None                      # never auto-detected, pass it explicitly

    def _load_impl(self):
        self.model = ...                 # load weights here (called by load())

    def propose_actions(self, observation, goal, vocabulary, history=None):
        # observation is a dict, propose named actions from `vocabulary`
        if observation.get("body_height", 0) and is_arrived(observation):
            return AdapterReply(actions=[ProposedAction("sit")], done=True)
        return AdapterReply(
            actions=[ProposedAction("walk_forward", {"distance_m": 1.0})],
            note="advancing",
        )
propose_actions is called with keyword arguments. The first parameter must be named exactly observation (then goal, vocabulary, history), and it must return an AdapterReply, not a bare list. Set done=True to end the loop.
MemberPurpose
name, descriptionIdentify the adapter.
propose_actions(observation, goal, vocabulary, history=None)Return an AdapterReply of ProposedActions.
detect(root)Classmethod auto-detection hook (filesystem only). Return a path or None.
load() / _load_impl()load() is idempotent and calls your _load_impl().
is_loadedWhether the model is ready.
In ProposedAction(name, params={}, rationale=""), name must be in the vocabulary and params accepts distance_m, rotation_rad, speed, and so on. Wrap them in AdapterReply(actions=[], done=False, note="").

Modalities

A Modality computes extra observation keys each tick (depth, vision, distance to target, …). It’s how the model “sees” more than raw proprioception.
from cadenza_lab import Modality, ModalityResult
import math

class Proximity(Modality):
    name = "proximity"
    def compute(self, observation) -> ModalityResult:
        tx, ty = (-3.0, 0.0)
        d = math.hypot(tx - observation.pos[0], ty - observation.pos[1])
        return ModalityResult(keys={"target_dist": d}, summary=f"target_dist={d:.2f}m")
The keys merge into the dict your adapter receives. summary is printed each tick when verbose=True. compute takes an Observation.
cadenza.stack.list_modalities()          # registered Modality types
cadenza.stack.get_modality("vision")
cadenza.stack.register_modality(MyModality)
Pass instances/classes/names to run(..., modalities=[Proximity()]).

Demo: drive a goal headless

This runs with no model on disk and no display. It is a self-contained heuristic adapter plus a modality, the exact pattern used in the full project.
"""demo_stack.py: reach a target and sit, fully headless (CI-safe)."""
import math
from pathlib import Path
import cadenza_lab as cadenza
from cadenza_lab import (WorldModelAdapter, AdapterReply, ProposedAction,
                         Modality, ModalityResult)

TARGET = (-3.0, 0.0)

class Proximity(Modality):
    name = "proximity"
    def compute(self, observation) -> ModalityResult:
        d = math.hypot(TARGET[0] - observation.pos[0], TARGET[1] - observation.pos[1])
        return ModalityResult(keys={"target_dist": d}, summary=f"d={d:.2f}m")

class HeadingPolicy(WorldModelAdapter):
    name = "heading"
    @classmethod
    def detect(cls, root: Path): return None
    def propose_actions(self, observation, goal, vocabulary, history=None):
        d = observation.get("target_dist", 9.0)
        pos, yaw = observation["pos"], observation["rpy"][2]
        if d < 0.6:
            return AdapterReply(actions=[ProposedAction("sit")], done=True, note="arrived")
        desired = math.atan2(TARGET[1] - pos[1], TARGET[0] - pos[0])
        err = (desired - (yaw + math.pi) + math.pi) % (2 * math.pi) - math.pi
        if abs(err) > 0.4:
            name = "turn_left" if err > 0 else "turn_right"
            return AdapterReply(actions=[ProposedAction(name, {"rotation_rad": min(abs(err), 0.8)})])
        return AdapterReply(actions=[ProposedAction("walk_forward", {"distance_m": min(d, 1.0)})])

result = cadenza.stack.run(
    robot="go1", goal="reach the target and sit", target=TARGET,
    world_model=HeadingPolicy, modalities=[Proximity()],
    headless=True, render_camera=False, max_iterations=15, verbose=True,
)
print("done:", result.done, "actions:", result.total_actions,
      "final pos:", result.final_observation.pos.round(2))
python demo_stack.py
# ...
# done: True actions: 4 final pos: [-2.78  0.22  0.25]