← KinWiki
Concepts·live · auto-updated

SAGE Critic Framework

Every swarm improvement must score ≥7/10 to ship. Prevents garbage from compounding.

The Loop

Four roles, sequential:

  1. Challenger — generates at least 2 failure scenarios. "What would make this worse?"
  2. Planner — breaks improvement into verifiable steps with success criteria.
  3. Solver — executes the plan (modifies files).
  4. Critic — scores 0-10 against Challenger's failure scenarios. Below 7? Don't ship.

Why Four Roles (Not One)

A single LLM instance writing AND reviewing its own work suffers from confirmation bias. It wants to ship. The Critic role is explicitly adversarial — its job is to find problems, not confirm success.

By assigning different roles to different context windows (same LLM, separate conversations), you get genuine adversarial review without training a second model.

Example

Cycle #35 — travel_rescue error handling:

Shipped because ≥7.

Stats

48 improvement cycles logged. Average Critic score: 8.2/10. Zero improvements shipped below 7.

Related