← All KinPapers
5The LocalKin Team · April 2026

Thin Soul, Fat Skill: A Token-Efficient Architecture for Production Multi-Agent Systems

The LocalKin Team

Abstract

Current multi-agent frameworks embed reasoning logic, domain knowledge, and execution procedures directly into LLM prompts, resulting in memory footprints exceeding 200MB per agent and practical ceilings of 8--10 agents on consumer hardware. We present Thin Soul + Fat Skill, an architecture that separates agent identity (a declarative "soul file" of approximately 30--120 lines of YAML and Markdown) from execution logic (deterministic "skill" scripts of arbitrary size). The soul file is consumed by the LLM as a system prompt; skills execute outside the token window entirely. This separation reduces per-agent memory to 12.5MB and enables 75 specialized agents to run concurrently on a single Mac Mini with 16GB RAM and 960MB total system memory.

Keywords: multi-agent systems, LLM architecture, token efficiency, agent orchestration, tool use

1. Introduction

The promise of multi-agent LLM systems runs headlong into a resource wall. Frameworks such as AutoGen, CrewAI, and MetaGPT define agents as heavyweight Python objects that bundle prompts, chain-of-thought logic, tool definitions, and conversation history into a single runtime entity. Each agent consumes 200--300MB of process memory before it answers a single query.

This paper introduces the Thin Soul + Fat Skill architecture which inverts the conventional design:

2. The Prompt-Obese Agent

Memory Comparison

FrameworkPer-Agent MemoryMax Agents (16GB)Architecture
AutoGen~250 MB~8Python class + prompt
CrewAI~200 MB~10Python class + prompt
MetaGPT~300 MB~6Python class + prompt
LocalKin12.5 MB75Soul file + Go runtime

3. Soul File Design

A soul file (*.soul.md) uses a two-part format: YAML frontmatter for machine-readable configuration, and a Markdown body for the system prompt.

The YAML frontmatter is divided into five concern areas:

Brain. Model provider, model name, temperature, context length, and fallback chain.

Permissions. Capability-based security model with shell, network, and filesystem controls.

Skills. A whitelist of skill names the agent may invoke.

Heart. Autonomous behavior schedule including heartbeat and periodic tasks.

Safety. Constraints including shell blocklists and circuit-breaker thresholds.

Because soul files are plain text, changes take effect without recompilation. More significantly, agents can programmatically patch soul files as part of an autonomous self-evolution loop.

4. Fat Skill Design

Execution Protocol

  1. Parse tool-call output from the LLM.
  2. Validate parameters against the SKILL.md schema.
  3. Interpolate template variables into command args.
  4. Spawn script as a subprocess with configured timeout.
  5. Capture stdout (JSON) and return to the LLM as a tool result.

Step 4 executes deterministically. No tokens are consumed parsing API documentation. No hallucination is possible.

Token Economics

ComponentSizeTokens Consumed
SKILL.md body (in prompt)35 lines~120 tokens
quote.py (subprocess)215 lines0 tokens

Compare this to prompt-embedded approaches consuming 800--1,200 tokens per agent per call.

5. Compound Skills

A compound skill wraps a multi-step script. The LLM makes one tool call; the script executes the entire pipeline deterministically. This reduces LLM round trips by 50--87% for multi-step workflows.

6. The Conductor Pattern

Agents are organized into domain-specific teams led by conductor agents. The TCM Conductor manages 11 historical physician agents. The Board Conductor implements a corporate advisory board with five C-suite agents. Conductors use the heart.schedule field for autonomous wakeup cycles.

7. The Forge: Runtime Skill Generation

The soul_forge skill is a meta-skill: an LLM-powered tool that creates new tools. When the swarm identifies a capability gap, the swarm_architect agent can invoke soul_forge to design and create new soul and skill files.

8. Evaluation

Memory Breakdown (75 agents)

ComponentMemory
Go runtime binary45 MB
75 parsed soul structs1.5 MB
Skill registry (75 skills)2.1 MB
MQTT broker (heartbeat)12 MB
HTTP server + routing8 MB
Per-agent goroutine overhead150 MB
Conversation buffers (75 agents)741.4 MB
Total960 MB

Token Efficiency

9. Conclusion

The Thin Soul + Fat Skill architecture demonstrates that the dominant cost in multi-agent systems is not the agents themselves but the framework overhead surrounding them. The separation of identity from capability is a simple idea. Its compounding effects --- on memory, tokens, security, hot-swapping, machine modification, and team orchestration --- suggest it is also a consequential one.

References

  1. Hong, S., et al. (2023). MetaGPT. arXiv:2308.00352.
  2. Khattab, O., et al. (2023). DSPy. arXiv:2310.03714.
  3. Moura, J. (2024). CrewAI. GitHub repository.
  4. Wu, Q., et al. (2023). AutoGen. arXiv:2308.08155.
  5. LangChain. (2024). LangGraph.
  6. Sun, J. (2026). Self-Evolving Swarms. LocalKin Technical Report.