The layers
Models
The learned components — vision-language-action (VLA) models, world models,
and perception. They turn raw sensory input and intent into a plan.
Policies
The decision layer: a function from observation to action. A policy is what
you actually train and deploy.
Control & actions
The vocabulary of motion — gaits, grasps, balance, recovery — composed into
the behaviors a policy can call on.
Simulation & physics
The environment the robot acts in. High-fidelity physics is where policies
are trained and validated before touching hardware.
Data & training
Rollouts, rewards, and the pipelines that turn experience into better
policies through RL and fine-tuning.
Software infrastructure
The connective tissue — orchestration, logging, evaluation, and deployment
drivers that make the whole loop reproducible.
Models vs the rest of the stack
It’s tempting to think physical AI is “just” a model problem. It isn’t. A state-of-the-art model is useless without:- a simulator good enough to train and test it safely,
- a control layer that turns its outputs into stable motion,
- a data pipeline that captures and labels what happened,
- and infrastructure to run, evaluate, and deploy it.
How Cadenza maps onto the stack
| Layer | In Cadenza |
|---|---|
| Models | Bring your own VLA / world model, or use the built-in inference stack |
| Policies | Trained and fine-tuned with RL + LoRA |
| Control & actions | The reusable, phase-aware action library |
| Simulation & physics | The MuJoCo-based simulator with Go1 / G1 robots |
| Data & training | Automatic rollout logging, scoring, and fine-tuning |
| Software infrastructure | The SDK foundation + the CLI workflow, plus deploy drivers |
Next: Models & policies
Zoom into the model layer — VLA models, world models, and what a policy is.