Inference stack

The stack runs a goal-directed perceive → reason → act loop that turns a natural-language goal into robot actions using a pluggable world model (your VLA / policy). It lives under cadenza.stack. The world model never touches motors. It reads an observation and proposes named actions from the action vocabulary. The stack validates, times, and executes them, then feeds the next observation back.

Run with a world model

import cadenza_lab as cadenza

result = cadenza.stack.run(
    robot="go1",
    goal="walk to the marker and sit",
    target=(-3.0, 0.0),         # optional (x, y), exposed as obs['target_xy']
    world_model=MyVLA,          # adapter class, instance, name, or handle
    headless=True,
    render_camera=False,
    max_iterations=20,
)
print(result.done, result.total_actions)

stack.run(robot="go1", goal="", *, target=None, world_model=None,
          modalities=None, root=".", max_iterations=250, headless=False,
          render_camera=True, xml_path=None, verbose=True) -> StackResult

Param	Description
`goal`	Natural-language objective, passed to the model each tick.
`target`	Optional `(x, y)` location. Surfaces as `observation['target_xy']`.
`world_model`	Adapter class, instance, registered name, or `WorldModelHandle`. `None` = auto-detect.
`modalities`	List of `Modality` instances/classes/names (see below).
`xml_path`	Scene XML to load. Compile a `Scene` to get one.
`max_iterations`	Max reasoning ticks.
`headless` / `render_camera`	Viewer + camera control.

StackResult carries .done, .total_actions, .executed, .notes, and .final_observation.

Choosing the model

You have…	Pass
An adapter class	`world_model=MyVLA`
A built adapter instance	`world_model=MyVLA(model=policy)`
A model on disk / HF cache	leave `world_model=None` (auto-detect at `root`)
A registered name	`register_world_model(...)` then `world_model=None`

register_world_model(adapter, checkpoint=None, model=None, **metadata) -> WorldModelHandle

register_world_model pins the auto-detected model. After calling it, run with world_model=None. To address an adapter by name string, register the class with cadenza.stack.adapters.base.register_adapter first.

Implement a world model

Subclass WorldModelAdapter and implement propose_actions.

from cadenza_lab import WorldModelAdapter, AdapterReply, ProposedAction
from pathlib import Path

class MyVLA(WorldModelAdapter):
    name = "my-vla"
    description = "Example heuristic policy."

    @classmethod
    def detect(cls, root: Path):
        return None                      # never auto-detected, pass it explicitly

    def _load_impl(self):
        self.model = ...                 # load weights here (called by load())

    def propose_actions(self, observation, goal, vocabulary, history=None):
        # observation is a dict, propose named actions from `vocabulary`
        if observation.get("body_height", 0) and is_arrived(observation):
            return AdapterReply(actions=[ProposedAction("sit")], done=True)
        return AdapterReply(
            actions=[ProposedAction("walk_forward", {"distance_m": 1.0})],
            note="advancing",
        )

propose_actions is called with keyword arguments. The first parameter must be named exactly observation (then goal, vocabulary, history), and it must return an AdapterReply, not a bare list. Set done=True to end the loop.

Member	Purpose
`name`, `description`	Identify the adapter.
`propose_actions(observation, goal, vocabulary, history=None)`	Return an `AdapterReply` of `ProposedAction`s.
`detect(root)`	Classmethod auto-detection hook (filesystem only). Return a path or `None`.
`load()` / `_load_impl()`	`load()` is idempotent and calls your `_load_impl()`.
`is_loaded`	Whether the model is ready.

In ProposedAction(name, params={}, rationale=""), name must be in the vocabulary and params accepts distance_m, rotation_rad, speed, and so on. Wrap them in AdapterReply(actions=[], done=False, note="").

Modalities

A Modality computes extra observation keys each tick (depth, vision, distance to target, …). It’s how the model “sees” more than raw proprioception.

from cadenza_lab import Modality, ModalityResult
import math

class Proximity(Modality):
    name = "proximity"
    def compute(self, observation) -> ModalityResult:
        tx, ty = (-3.0, 0.0)
        d = math.hypot(tx - observation.pos[0], ty - observation.pos[1])
        return ModalityResult(keys={"target_dist": d}, summary=f"target_dist={d:.2f}m")

The keys merge into the dict your adapter receives. summary is printed each tick when verbose=True. compute takes an Observation.

cadenza.stack.list_modalities()          # registered Modality types
cadenza.stack.get_modality("vision")
cadenza.stack.register_modality(MyModality)

Pass instances/classes/names to run(..., modalities=[Proximity()]).

Demo: drive a goal headless

This runs with no model on disk and no display. It is a self-contained heuristic adapter plus a modality, the exact pattern used in the full project.

"""demo_stack.py: reach a target and sit, fully headless (CI-safe)."""
import math
from pathlib import Path
import cadenza_lab as cadenza
from cadenza_lab import (WorldModelAdapter, AdapterReply, ProposedAction,
                         Modality, ModalityResult)

TARGET = (-3.0, 0.0)

class Proximity(Modality):
    name = "proximity"
    def compute(self, observation) -> ModalityResult:
        d = math.hypot(TARGET[0] - observation.pos[0], TARGET[1] - observation.pos[1])
        return ModalityResult(keys={"target_dist": d}, summary=f"d={d:.2f}m")

class HeadingPolicy(WorldModelAdapter):
    name = "heading"
    @classmethod
    def detect(cls, root: Path): return None
    def propose_actions(self, observation, goal, vocabulary, history=None):
        d = observation.get("target_dist", 9.0)
        pos, yaw = observation["pos"], observation["rpy"][2]
        if d < 0.6:
            return AdapterReply(actions=[ProposedAction("sit")], done=True, note="arrived")
        desired = math.atan2(TARGET[1] - pos[1], TARGET[0] - pos[0])
        err = (desired - (yaw + math.pi) + math.pi) % (2 * math.pi) - math.pi
        if abs(err) > 0.4:
            name = "turn_left" if err > 0 else "turn_right"
            return AdapterReply(actions=[ProposedAction(name, {"rotation_rad": min(abs(err), 0.8)})])
        return AdapterReply(actions=[ProposedAction("walk_forward", {"distance_m": min(d, 1.0)})])

result = cadenza.stack.run(
    robot="go1", goal="reach the target and sit", target=TARGET,
    world_model=HeadingPolicy, modalities=[Proximity()],
    headless=True, render_camera=False, max_iterations=15, verbose=True,
)
print("done:", result.done, "actions:", result.total_actions,
      "final pos:", result.final_observation.pos.round(2))

python demo_stack.py
# ...
# done: True actions: 4 final pos: [-2.78  0.22  0.25]

Get started

Features

Tutorial

Run with a world model

Choosing the model

Implement a world model

Modalities

Demo: drive a goal headless

​Run with a world model

​Choosing the model

​Implement a world model

​Modalities

​Demo: drive a goal headless

Run with a world model

Choosing the model

Implement a world model

Modalities

Demo: drive a goal headless