Thin Soul, Fat Skill: A Token-Efficient Architecture for Production Multi-Agent Systems
The LocalKin Team
Abstract
Current multi-agent frameworks embed reasoning logic, domain knowledge, and execution procedures directly into LLM prompts, resulting in memory footprints exceeding 200MB per agent and practical ceilings of 8--10 agents on consumer hardware. We present Thin Soul + Fat Skill, an architecture that separates agent identity (a declarative "soul file" of approximately 30--120 lines of YAML and Markdown) from execution logic (deterministic "skill" scripts of arbitrary size). The soul file is consumed by the LLM as a system prompt; skills execute outside the token window entirely. This separation reduces per-agent memory to 12.5MB and enables 75 specialized agents to run concurrently on a single Mac Mini with 16GB RAM and 960MB total system memory.
Keywords: multi-agent systems, LLM architecture, token efficiency, agent orchestration, tool use
1. Introduction
The promise of multi-agent LLM systems runs headlong into a resource wall. Frameworks such as AutoGen, CrewAI, and MetaGPT define agents as heavyweight Python objects that bundle prompts, chain-of-thought logic, tool definitions, and conversation history into a single runtime entity. Each agent consumes 200--300MB of process memory before it answers a single query.
This paper introduces the Thin Soul + Fat Skill architecture which inverts the conventional design:
- ●Soul (thin): A declarative file (~30--120 lines) that defines who the agent is.
- ●Skill (fat): A standalone script paired with a SKILL.md manifest that defines what the agent can do. Skills execute deterministically in a subprocess. Their code never enters the token window.
2. The Prompt-Obese Agent
Memory Comparison
| Framework | Per-Agent Memory | Max Agents (16GB) | Architecture |
|---|---|---|---|
| AutoGen | ~250 MB | ~8 | Python class + prompt |
| CrewAI | ~200 MB | ~10 | Python class + prompt |
| MetaGPT | ~300 MB | ~6 | Python class + prompt |
| LocalKin | 12.5 MB | 75 | Soul file + Go runtime |
3. Soul File Design
A soul file (*.soul.md) uses a two-part format: YAML frontmatter for machine-readable configuration, and a Markdown body for the system prompt.
The YAML frontmatter is divided into five concern areas:
Brain. Model provider, model name, temperature, context length, and fallback chain.
Permissions. Capability-based security model with shell, network, and filesystem controls.
Skills. A whitelist of skill names the agent may invoke.
Heart. Autonomous behavior schedule including heartbeat and periodic tasks.
Safety. Constraints including shell blocklists and circuit-breaker thresholds.
Because soul files are plain text, changes take effect without recompilation. More significantly, agents can programmatically patch soul files as part of an autonomous self-evolution loop.
4. Fat Skill Design
Execution Protocol
- ●Parse tool-call output from the LLM.
- ●Validate parameters against the SKILL.md schema.
- ●Interpolate template variables into command args.
- ●Spawn script as a subprocess with configured timeout.
- ●Capture stdout (JSON) and return to the LLM as a tool result.
Step 4 executes deterministically. No tokens are consumed parsing API documentation. No hallucination is possible.
Token Economics
| Component | Size | Tokens Consumed |
|---|---|---|
| SKILL.md body (in prompt) | 35 lines | ~120 tokens |
| quote.py (subprocess) | 215 lines | 0 tokens |
Compare this to prompt-embedded approaches consuming 800--1,200 tokens per agent per call.
5. Compound Skills
A compound skill wraps a multi-step script. The LLM makes one tool call; the script executes the entire pipeline deterministically. This reduces LLM round trips by 50--87% for multi-step workflows.
6. The Conductor Pattern
Agents are organized into domain-specific teams led by conductor agents. The TCM Conductor manages 11 historical physician agents. The Board Conductor implements a corporate advisory board with five C-suite agents. Conductors use the heart.schedule field for autonomous wakeup cycles.
7. The Forge: Runtime Skill Generation
The soul_forge skill is a meta-skill: an LLM-powered tool that creates new tools. When the swarm identifies a capability gap, the swarm_architect agent can invoke soul_forge to design and create new soul and skill files.
8. Evaluation
Memory Breakdown (75 agents)
| Component | Memory |
|---|---|
| Go runtime binary | 45 MB |
| 75 parsed soul structs | 1.5 MB |
| Skill registry (75 skills) | 2.1 MB |
| MQTT broker (heartbeat) | 12 MB |
| HTTP server + routing | 8 MB |
| Per-agent goroutine overhead | 150 MB |
| Conversation buffers (75 agents) | 741.4 MB |
| Total | 960 MB |
Token Efficiency
- ●Prompt-embedded (AutoGen-style): 1,847 tokens per tool interaction
- ●Thin Soul + Fat Skill: 312 tokens
- ●Reduction: 83%
9. Conclusion
The Thin Soul + Fat Skill architecture demonstrates that the dominant cost in multi-agent systems is not the agents themselves but the framework overhead surrounding them. The separation of identity from capability is a simple idea. Its compounding effects --- on memory, tokens, security, hot-swapping, machine modification, and team orchestration --- suggest it is also a consequential one.
References
- ●Hong, S., et al. (2023). MetaGPT. arXiv:2308.00352.
- ●Khattab, O., et al. (2023). DSPy. arXiv:2310.03714.
- ●Moura, J. (2024). CrewAI. GitHub repository.
- ●Wu, Q., et al. (2023). AutoGen. arXiv:2308.08155.
- ●LangChain. (2024). LangGraph.
- ●Sun, J. (2026). Self-Evolving Swarms. LocalKin Technical Report.