env.json is the single source of truth for a mission. This page documents every
field. Coordinates are world-space, and forward is −X.
Top level
| Key | Type | Meaning |
|---|---|---|
name | string | Mission name (used in fine-tune prompts). |
robot | string | go1 (default) or g1. |
spawn | [x, y, z] | Robot start position. |
scene | object | Static world: objects and zones. |
phases | array | Ordered sub-goals, executed in sequence. |
vla_finetune | object | Prompt template for exported training rows. |
scene
| Field | Meaning |
|---|---|
type | box (solid, collidable) or marker (visual-only thin plate). |
tag | Label referenced by rewards/predicates (e.g. rubble, victim). Tags need not be unique. |
position | [x, y, z] center, world coords. |
size | [sx, sy, sz] half-extents. |
rgba | Optional color [r, g, b, a] (markers). |
| Field | Meaning |
|---|---|
name | Zone name, referenced by predicates and distance_to_zone. |
aabb | Axis-aligned box [[min_x,min_y,min_z],[max_x,max_y,max_z]]. |
phases
Phases run in order. The mission advances when a phase’s success_when fires,
and ends early if any fail_when fires or max_ticks is exceeded.
| Field | Meaning |
|---|---|
name | Phase identifier. |
goal_prompt | Natural-language goal. Drives the scripted driver and is the prompt fed to a policy/VLA for this phase. |
success_when | Predicate that completes the phase. |
fail_when | Predicate that fails the phase (usually {"flipped": true}). |
reward | Reward-shaping terms (below). |
max_ticks | Phase timeout. |
Predicates
Used by bothsuccess_when and fail_when:
| Predicate | Fires when |
|---|---|
{"enter_zone": "<name>"} | Robot enters the named zone. |
{"exit_zone": "<name>"} | Robot leaves the named zone. |
{"near_object": {"tag": "<tag>", "max_distance_m": 0.5, "for_ticks": 3}} | Robot stays within max_distance_m of a tagged object for for_ticks. |
{"elapsed_ticks": <int>} | A tick count is reached. |
{"flipped": true} | Robot has flipped/fallen. |
Reward terms
All optional and additive per tick:| Term | Effect |
|---|---|
step_cost | Constant per-tick reward. Set negative as a time penalty. |
distance_to_zone | {zone, weight} → weight × (prev_dist − now_dist) to the zone center. Positive = getting closer. |
distance_to_tag | {tag, weight} → same, but to the first object with that tag. |
forward_distance_gain | scalar × (prev_x − now_x). Rewards moving forward (−X). |
contact_with_rubble_penalty | Per-tick penalty for being within 0.25 m of any rubble-tagged object. |
fall_penalty | Applied each tick roll/pitch exceeds 60°. |
phase_success_bonus | One-shot bonus when success_when fires. |
phase_failure_penalty | One-shot penalty when fail_when fires. |
vla_finetune
| Field | Meaning |
|---|---|
prompt_template | Format string for each exported row. Placeholders: {mission_name}, {phase_idx}, {n_phases}, {phase_name}, {goal_prompt}. |
include_failures | If true, failed-phase transitions are also exported. |
prompt field in <run-id>.finetune.jsonl and in
env finetune output. Feed those rows straight into a VLA SFT or offline-RL
pipeline.