Environments (embodiment)¶

Added in v0.21.0 — an experimental research seam.

Everything else answers a prompt. An Environment turns "answer a question" into "act, observe, and get rewarded": the agent reset()s, then step(action)s, receiving an Observation (text + reward + done) each time. A rollout runs an LLM policy in the environment until the episode ends. This is the substrate for interactive, feedback-driven agents — and, later, reinforcement learning.

reset() ──▶ observation ──▶ policy(obs) → action ──▶ step(action) ──▶ observation (+reward, +done)
                              (LLM or a function)        (the environment)         └─ loop until done / max_steps

CLI¶

riptide env guessing --max-steps 10 --offline
#  environment 'guessing': 10 step(s), total reward 0.0
#    1. obs: Guess a whole number between 1 and 100. | action: ... | reward 0.0
#    ...

(Offline, the deterministic gateway is a weak policy — the point is the loop; a real model plays it well.)

In code¶

from riptide_watergraph.environments import make_environment, rollout

env = make_environment("guessing")

# A policy maps an observation (text) to an action (text) — here, binary search:
state = {"lo": 1, "hi": 100}
def policy(obs: str) -> str:
    if "higher" in obs: state["lo"] = state["mid"] + 1
    elif "lower" in obs: state["hi"] = state["mid"] - 1
    state["mid"] = (state["lo"] + state["hi"]) // 2
    return str(state["mid"])

result = rollout(env, policy, max_steps=20)
print(result.total_reward, result.steps)   # 1.0 (solved) in ~7 steps

Run an LLM policy through the service:

from riptide_watergraph.service import run_in_environment

result = run_in_environment("guessing", max_steps=10)   # uses the configured model as the policy

The seam (swappable)¶

Interface	Default	Purpose
`Environment`	`GuessingGameEnv` (toy)	`reset()` / `step(action)` → `Observation(text, reward, done)`
`rollout(env, policy, ...)`	—	run a policy to episode end; returns `Rollout(steps, total_reward, transitions)`
`make_environment(name)`	registry	build a named environment (`ENVIRONMENTS`)

Implement Environment to plug in your own world (a code repo, a browser, a game). The bundled GuessingGameEnv is deterministic so the whole act/observe/reward loop runs offline at 100% coverage.

Roadmap context¶

An experimental seam alongside multimodal perception. With a reward signal in hand, this is the substrate the remaining research direction (reward/RL, learned policies) would build on.