Structured Multi-Agent Debate with Domain-Expert Routing
The LocalKin Team
April 2026
Keywords: multi-agent debate, domain routing, conductor architecture, structured deliberation, traditional Chinese medicine, quantitative finance, swarm intelligence
Abstract
Multi-agent debate has emerged as a powerful paradigm for improving reasoning quality in large language model (LLM) systems. However, existing approaches broadcast every question to all participating agents, regardless of domain relevance. This introduces computational waste and dilutes expert signal with noise from agents whose expertise is orthogonal to the question.
We present Domain-Expert Routing, a conductor-based architecture that interposes a routing layer between incoming queries and the agent pool. A conductor agent selects a relevant subset of experts (typically 4--6 out of 11--75 agents), then orchestrates a structured multi-round debate among only those agents. We instantiate this architecture in two production systems: (1) a TCM consultation system routing to 11 historical physician agents, and (2) a quantitative finance pipeline executing a 5-phase sequential protocol with different agent subsets at each phase.
Across both domains, domain-expert routing achieves higher consensus quality (80.2% weighted agreement in TCM debates), reduces per-query agent invocations by 55--65%, and enables Phase 0 verification gates that catch hallucinated financial data before publication.
1. Introduction
The multi-agent debate paradigm posits that LLM agents, when given the opportunity to argue, rebut, and revise their positions across multiple rounds, produce more accurate outputs than any single agent in isolation.
Yet a fundamental inefficiency persists: every agent participates in every debate. When a patient presents with gynecological symptoms, the acupuncture specialist, pharmacologist, and theoretical cosmologist all weigh in. The result is threefold waste: (1) computational cost scales with fleet size, (2) irrelevant opinions introduce noise, and (3) agents outside their domain are more likely to hallucinate.
We propose domain-expert routing. A conductor agent selects a task-appropriate subset of experts before initiating the debate.
2. Architecture
2.1 TCM Routing
The TCM conductor manages 11 historical physician agents spanning from the Yellow Emperor to the Qing dynasty. The routing table maps clinical categories to expert subsets:
| Category | Expert Subset | Rationale |
|---|---|---|
| General internal medicine | Zhang Zhongjing, Sun Simiao, Li Dongyuan, Zhu Danxi | Core diagnosticians |
| Warm disease / fever | Zhang Zhongjing, Ye Tianshi, Liu Wansu, Sun Simiao | Ye Tianshi's Wei-Qi-Ying-Xue system |
| Gynecology | Fu Qingzhu, Zhang Zhongjing, Zhu Danxi, Sun Simiao | Fu Qingzhu's specialty |
| Acupuncture | Huangfu Mi, Zhang Zhongjing, Sun Simiao, Hua Tuo | Huangfu Mi's Zhenjiu Jiayi Jing |
| Surgery / emergency | Hua Tuo, Zhang Zhongjing, Sun Simiao, Huangfu Mi | Hua Tuo's surgical expertise |
| Pharmacology | Li Shizhen, Sun Simiao, Zhang Zhongjing | Li Shizhen's Bencao Gangmu |
| Theory / pedagogy | Huang Di, Zhang Zhongjing, Zhu Danxi, Liu Wansu, Li Dongyuan | Foundational theorists |
2.2 Quant Pipeline Routing
| Phase | Name | Agents | Output |
|---|---|---|---|
| Phase 0 | Price verification | stock_price skill (API) | Verified price + timestamp |
| Phase 1 | Data collection | 4 analysts | Independent reports |
| Phase 2 | Adversarial debate | Bull team vs. Bear team | Debate transcript |
| Phase 3 | Trade proposal | Trader agent | Entry/exit/sizing |
| Phase 4 | Risk check | Risk manager | Approval / rejection |
| Phase 5 | Publication | Conductor | Final report to KinBook |
3. Debate Protocol
3.1 Round 1: Diverse Strategies
Each agent receives a DMAD reasoning strategy assignment from eight strategies: analytical, analogical, contrastive, first-principles, empirical, devil's advocate, systems thinking, and historical.
Agents respond in structured format with DOMAIN_ANGLE, POSITION, CONFIDENCE, REASONING, EVIDENCE, and INDEPENDENCE fields.
3.2 Round 2+: Informed Revision
Agents receive all prior positions, a cumulative evidence pool, and an IBIS rebuttal pool. Position changes are tracked explicitly.
3.3 Consensus Mechanism
Positions are tallied using confidence-weighted voting. A position is declared consensus if its weighted ratio exceeds 0.70. Consensus inertia detection flags potential social conformity when >60% of agents changed position and >50% self-report as influenced.
4. Safety Architecture
4.1 Phase 0 Verification Gate
Financial reports require real-time price verification before any analytical content is generated. If the stock_price skill returns an error, the conductor halts with [IDLE].
Phase 0 verification is protected from the self-evolution mechanism --- classified as a runtime safety constraint that cannot be modified by the swarm architect.
4.2 Mandatory Disclaimers
Disclaimers are appended at the runtime level, not the agent level, ensuring they cannot be omitted by agent self-modification.
4.3 Ollama Fallback Policy
When the primary LLM provider is unavailable and agents fall back to local models, sensitive domain agents enter [IDLE] mode. Silence is preferable to confabulation.
5. Evaluation
5.1 TCM: Spring Pollen Debate
Five masters debated spring allergy treatment. Weighted support ratio: 80.2%. Verdict: consensus for tonifying Qi as primary approach, with heat-clearing as complementary for damp-heat constitution patients.
5.2 Quality Trajectory
| Metric | Day 1 | Day 5 |
|---|---|---|
| Phase 0 compliance | 67% | 100% |
| Disclaimer presence | 80% | 100% |
| Hallucinated prices | 2 | 0 |
| Overall compliance | 75% | 92% |
5.3 Routing Efficiency
| System | Fleet Size | Broadcast | Routed | Reduction |
|---|---|---|---|---|
| TCM | 11 | 11 | 4.3 (mean) | 61% |
| Quant | 6 | 6 | 2.5 (mean per phase) | 58% |
| Prediction | 75 | 10 (max) | 4.6 (mean) | 54% |
6. Conclusion
Domain-expert routing addresses a practical inefficiency in multi-agent debate: not every agent needs to weigh in on every question. The key insight is that expertise is not uniformly distributed, and debate protocols should respect this by routing questions to the agents best equipped to answer them.
References
Du, Y., et al. (2023). Improving Factuality and Reasoning through Multiagent Debate. arXiv:2305.14325.
Liang, T., et al. (2024). Encouraging Divergent Thinking through Multi-Agent Debate. arXiv:2305.19118.
Sun, J. (2026). Self-Evolving Multi-Agent Swarms. Technical Report, The LocalKin Team.