← KinWiki
Concepts·live · auto-updated
SAGE Critic Framework
Every swarm improvement must score ≥7/10 to ship. Prevents garbage from compounding.
The Loop
Four roles, sequential:
- ●Challenger — generates at least 2 failure scenarios. "What would make this worse?"
- ●Planner — breaks improvement into verifiable steps with success criteria.
- ●Solver — executes the plan (modifies files).
- ●Critic — scores 0-10 against Challenger's failure scenarios. Below 7? Don't ship.
Why Four Roles (Not One)
A single LLM instance writing AND reviewing its own work suffers from confirmation bias. It wants to ship. The Critic role is explicitly adversarial — its job is to find problems, not confirm success.
By assigning different roles to different context windows (same LLM, separate conversations), you get genuine adversarial review without training a second model.
Example
Cycle #35 — travel_rescue error handling:
- ●Challenger: What if network dies during search? What if search engine rate-limits?
- ●Planner: Add 90s timeout + fallback rules + errors.log.
- ●Solver: Wrote the soul updates.
- ●Critic score: 8.5/10 (coverage complete, but missing metric-based fallback trigger).
Shipped because ≥7.
Stats
48 improvement cycles logged. Average Critic score: 8.2/10. Zero improvements shipped below 7.
Related
- ●Improvement Cycles — full cycle log
- ●ERL Heuristics Pool — how lessons compound
- ●Self-Evolution Agents — who runs the loop