Skip to content

Deliberate reasoning

Added in v0.17.0 — track 3 of the AGI-direction roadmap.

Self-consistency voting picks the majority answer with no check on quality. Deliberation does better: generate several candidates from different reasoning angles, score each with a verifier, and keep the best — verified best-of-N, the core of test-time-compute scaling (System 2). The result carries a confidence that blends the winner's verifier score with how much the candidates agree — a calibrated signal for metacognition ("am I sure, or should I escalate?").

task ──▶ diverse candidates ──▶ verifier scores each ──▶ best answer + confidence
         (direct, stepwise,       (HeuristicVerifier /     (escalate when low)
          critical, …)             LLMVerifier)

CLI

riptide deliberate "What is the boiling point of water at sea level?" --samples 4 --offline
#  deliberation (4 candidates)
#    [0.78] stepwise     ...
#    [0.61] direct       ...
#    ...
#  BEST (score 0.78, confidence 0.71)
#  <answer>

If confidence is low, the CLI suggests escalating (more samples, debate, or a human).

In code

from riptide_watergraph.service import deliberate_task

result = deliberate_task("explain reciprocal rank fusion", samples=4)
print(result.answer, result.score, result.confidence)
if not result.confident(threshold=0.6):
    ...  # escalate: more samples / ask a human

# or drive the primitive directly with your own gateway + verifier
from riptide_watergraph.reasoning import deliberate, HeuristicVerifier
res = await deliberate(task, gateway=gw, model="gpt-4o-mini",
                       verifier=HeuristicVerifier(), samples=5)

The seam (swappable)

Interface Default Purpose
Verifier HeuristicVerifier (offline) / LLMVerifier score a candidate answer 0..1
deliberate(...) generate diverse candidates → score → best + confidence
DeliberationResult the winner, ranked candidates, and a confidence (+ confident())

Candidates are generated under distinct reasoning styles (DEFAULT_STYLES: direct, stepwise, critical, alternative, rigorous) — best-of-N only helps if the candidates actually differ. Offline, the deterministic HeuristicVerifier scores by task/answer token overlap so the whole path runs without a key.

Roadmap context

Track 3 of the AGI-direction roadmap (after SkillForge and cognitive memory). The same Verifier seam powers what's next: multi-agent debate (agents critique and revise, the verifier judges), tree-search over reasoning steps, and metacognition wired into the graph (escalate compute only when confidence is low).