← All KinPapers
7The LocalKin Team · April 2026 · v1.0DOI 10.5281/zenodo.20094238

Knowledge Compile: Incremental LLM-Powered Knowledge Extraction Without Databases, Embeddings, or Graphs

Authors: The LocalKin Team

System: LocalKin (https://localkin.dev)

Date: April 2026

Abstract

We present Knowledge Compile, an incremental knowledge extraction system that converts raw text corpora into structured, LLM-ready knowledge without vector databases, embedding models, or knowledge graphs. The system processes source texts through two targeted LLM calls per document---one for concept extraction, one for FAQ generation---producing human-readable Markdown files that serve as a structured retrieval layer. Deployed in production as part of a 75-agent AI swarm, Knowledge Compile operates across 192 source texts spanning four domains (Traditional Chinese Medicine in Chinese and English, Christian spiritual classics in Chinese and English), totaling 77MB of primary sources dating from 200 CE to the 20th century. The system compiles autonomously at a rate of 3--5 books per day via scheduled task, with a total project cost of approximately $36 for the entire corpus. The compiled output---Layer 2 concepts, FAQ pairs, and cross-book synthesis indexes---reduces per-query token consumption by 10--50x compared to injecting raw source text, while maintaining the 100% retrieval accuracy of the underlying grep-based search. Unlike GraphRAG approaches that require entity extraction pipelines with <70% accuracy on Classical Chinese, Knowledge Compile produces human-auditable output with zero infrastructure dependencies. The key insight: LLMs are excellent one-time knowledge curators but expensive repeated retrieval engines---compile once, grep forever.

Keywords: knowledge extraction, knowledge compilation, retrieval-augmented generation, incremental processing, domain-specific AI, Traditional Chinese Medicine, digital humanities

1. Introduction

The dominant approaches to grounding LLM agents in domain-specific knowledge follow a common pattern: preprocess the corpus into a machine-optimized representation (embeddings, graph triples, or indexed chunks), then query that representation at inference time. Vector RAG embeds documents into high-dimensional space. GraphRAG extracts entities and relationships into knowledge graphs. Both require significant infrastructure, introduce preprocessing errors, and produce representations that are opaque to human inspection.

We propose a simpler alternative: use the LLM itself as a one-time knowledge curator, not a repeated retrieval engine. Given a source text, we make exactly two LLM calls---one to extract core concepts, one to generate FAQ pairs---and write the results as plain Markdown files alongside the originals. These compiled files become a structured retrieval layer: when an agent needs knowledge, grep searches both the raw source and the compiled summaries, providing the LLM with pre-structured context rather than raw passages.

This approach, which we call Knowledge Compile, is deployed in production as part of LocalKin, a 75-agent AI swarm. It processes 192 source texts across four knowledge domains, operates autonomously via daily scheduled tasks, and has been running continuously since March 2026.

The contribution is not the idea of summarizing documents---that is straightforward. The contribution is the specific architecture that makes this practical at scale: incremental compilation (skip already-processed files), cross-book synthesis (identify themes and contradictions across an author's works), autonomous scheduling (3--5 books per day, no human intervention), and seamless integration with grep-based retrieval. The result is a knowledge management system with zero infrastructure dependencies, human-readable output, and a total cost of $36 for a 192-book corpus.

2. The Problem with Existing Approaches

2.1 Vector RAG: Preprocessing Fragility

Standard RAG pipelines chunk documents, embed chunks using models like BGE-M3 or OpenAI's text-embedding-ada-002, store embeddings in vector databases (Pinecone, ChromaDB, Qdrant), and perform approximate nearest-neighbor search at query time. This pipeline introduces several failure modes:

2.2 GraphRAG: Entity Extraction Bottleneck

GraphRAG approaches (e.g., Microsoft GraphRAG, OpenTCM) construct knowledge graphs from source texts, then traverse the graph to answer queries. OpenTCM reports 48,000 entities and 152,000 relationships extracted from 68 gynecological texts, achieving 98.55% retrieval accuracy.

These results are impressive but obscure a critical bottleneck: entity extraction. Named Entity Recognition (NER) models achieve <70% accuracy on Classical Chinese medical texts, where entities are context-dependent (the same character can denote an herb, a symptom, or an anatomical location depending on surrounding text). Manual graph construction takes 10--30 hours per book. For a corpus of 192 books across two languages and four domains, GraphRAG is prohibitively expensive to bootstrap.

2.3 The Missing Middle Ground

Both approaches solve a problem that may not need solving. If the LLM is the ultimate consumer of retrieved knowledge, why not structure the knowledge for the LLM rather than for a retrieval algorithm? A well-structured Markdown summary is more useful to an LLM than a bag of embedding-similar chunks or a subgraph of entity triples.

3. Architecture

3.1 Three-Layer Knowledge Design

Knowledge Compile implements a three-layer architecture where each layer serves a distinct purpose:

Layer 1 (Raw Source). Original texts in .txt and .md format, stored in input/{domain}/{author}/. These files are never modified. They serve as the ground truth and remain directly searchable via grep. Current corpus: 192 files, 77MB, spanning texts from 200 CE (Shang Han Lun) to the 20th century (Story of a Soul).

Layer 2 (Compiled Knowledge). For each source file, two companion files are generated by a single LLM compilation pass:

These files are stored in {author}/_compiled/ alongside the originals.

Layer 2.5 (Cross-Source Index). When an author has two or more compiled books, an aggregation pass generates {author}/_index.md containing:

3.2 Compilation Pipeline

The compilation pipeline (compile.py, 426 lines of Python) implements five operations:

  1. list_needed: Scan a domain for uncompiled source files. A file is considered compiled if both {stem}_concepts.md and {stem}_faq.md exist.

  2. compile: Process a single source file. Read the text (150KB limit for LLM context), auto-detect language by checking for CJK Unicode characters, generate a language-appropriate prompt, make two sequential API calls to Claude Haiku (one for concepts, one for FAQ), write output files.

  3. compile_author: Compile all uncompiled files for a given author, sequentially.

  4. aggregate: Generate the cross-book _index.md for an author. Reads all compiled concept files, concatenates with book headers (80KB limit), sends to LLM for synthesis.

  5. status: Report compilation coverage per domain.

3.3 Integration with Grep-Based Retrieval

Knowledge Compile does not replace the retrieval system described in our earlier work (Grep is All You Need, LocalKin Team, 2026). It augments it. The existing knowledge_search skill performs:

  1. grep -r -i -C 8 across all source files and compiled files
  2. Fallback: cat of *_concepts.md, *_faq.md, and *_index.md files under 50KB

When a TCM agent queries "黄芪的配伍禁忌" (Astragalus compatibility contraindications), grep finds matches in both the raw Classical Chinese source and the compiled FAQ. The agent receives pre-structured knowledge alongside raw passages, enabling more accurate reasoning without additional inference cost.

4. Corpus

4.1 Domain Coverage

DomainFilesSizeAuthorsLanguageTime Span
tcm_zh9051 MB12Classical/Modern Chinese200 CE -- 1800 CE
spiritual_zh7214 MB9Classical/Modern Chinese400 CE -- 1900 CE
spiritual_en237.2 MB10English1400 CE -- 1900 CE
tcm_en74.2 MB5English (translations)200 CE -- 1600 CE
Total19277 MB372 languages1,800 years

4.2 Notable Sources

4.3 Compilation Status (April 2026)

DomainCompiledTotalCoverage
tcm_en5771%
spiritual_en42317%
spiritual_zh1721%
tcm_zh0900%
Total101925%

At the current rate of 3--5 books per day, the full corpus will be compiled in approximately 47 days.

5. Autonomous Operation

5.1 Scheduled Compilation

Knowledge Compile runs as a daily scheduled task (knowledge-growth) within the LocalKin swarm. Each day, the task:

  1. Queries compilation status across all four domains
  2. Identifies the domain with lowest coverage
  3. Selects 3--5 uncompiled files (smallest first for reliability)
  4. Runs compilation with incremental skipping (already-compiled files are not reprocessed)
  5. Generates cross-book indexes for authors whose complete works are now compiled
  6. Writes a daily report to output/knowledge_growth/{date}.md

5.2 Zero-Touch Growth

Adding new knowledge requires exactly one action: place a .txt or .md file in the appropriate input/{domain}/{author}/ directory. The scheduled task automatically discovers and compiles it on the next run. No re-indexing, no re-embedding, no schema migration, no graph rebuilding. This is the fundamental advantage over both Vector RAG (which requires re-embedding) and GraphRAG (which requires entity re-extraction and graph updating).

5.3 Cost Model

MetricValue
LLM modelClaude Haiku 4.5
Calls per file2 (concepts + FAQ)
Max tokens per call2,000
Cost per file~$0.15--$0.20
Cost per author (4--5 books)~$0.75
Daily cost (4 books/day)~$0.75
Total project cost (192 books)~$36

Compare this to GraphRAG, which requires LLM calls for entity extraction, relationship classification, and community summarization---typically 10--50x more LLM calls per document.

6. Evaluation

6.1 Token Efficiency

The primary metric for Knowledge Compile is token reduction per agent query. When an agent needs knowledge about a topic, it can reference compiled concepts (1.5--3.5 KB) instead of raw source text (50--250 KB).

SourceRaw SizeCompiled SizeReduction
Shang Han Lun (EN)89 KB5.1 KB17x
Practice of Presence of God42 KB4.2 KB10x
Ben Cao Gang Mu (EN)156 KB5.1 KB31x
Jia Yi Jing (EN)78 KB5.8 KB13x
Average91 KB5.1 KB18x

Over thousands of agent queries, this 18x token reduction translates directly to cost savings and faster response times.

6.2 Retrieval Quality

Knowledge Compile does not change retrieval accuracy---grep still returns 100% of keyword matches. What it changes is retrieval quality: the agent receives structured concepts and FAQ pairs alongside raw text passages, enabling more focused reasoning.

Qualitative assessment across 21 production agents shows:

6.3 Comparison with Alternative Approaches

DimensionKnowledge CompileVector RAGGraphRAG (OpenTCM)
Preprocessing time~90s per fileHours (chunking + embedding)10--30 min/book (entity extraction)
InfrastructureNoneVector databaseGraph database + embedding API
Accuracy on Classical ChineseHuman-auditable output<85% (embedding mismatch)<70% NER accuracy
Cost (192 books)$36$50--200 (embedding calls + hosting)$500+ (LLM extraction + hosting)
Adding new documentsDrop file, wait for next scheduled runRe-embed, re-indexRe-extract entities, rebuild graph
Output formatHuman-readable MarkdownOpaque vectorsEntity triples
Cross-book synthesisAutomatic (_index.md)Not supportedCommunity detection (automated)
Maintenance burdenZeroDatabase operationsGraph consistency checks
Lines of code426 (Python) + 90 (Shell)300--500+1,000+

7. Why This Works

7.1 LLMs Are Better Curators Than Retrievers

The fundamental insight behind Knowledge Compile is a division of labor: use LLMs where they excel (understanding, summarizing, structuring) and use simple tools where they suffice (keyword matching, file concatenation).

An LLM reading the Shang Han Lun can identify that Zhang Zhongjing's six-stage framework is the core organizational principle, that specific prescriptions map to specific stages, and that the text's clinical relevance persists after 1,800 years. This is a curation task that benefits from the LLM's broad training. Asking the same LLM to perform this curation on every query is wasteful---the knowledge doesn't change between queries.

Knowledge Compile performs the expensive curation once and stores the result in a format that costs nothing to retrieve.

7.2 Domain-Specific Vocabulary Is Predictable

Both TCM and Christian spiritual texts use highly standardized vocabularies. 麻黄 always refers to Ephedra. "Dark night of the soul" always refers to John of the Cross's framework. This predictability means grep achieves 100% recall for domain-relevant queries---there are no synonyms or paraphrases that would require semantic search.

7.3 Human Readability Is a Feature

Every output of Knowledge Compile is a Markdown file that a human can read, verify, and correct. This is not incidental---it is a design requirement. When a TCM agent provides advice based on compiled knowledge, a practitioner can trace the recommendation back to a specific concept file, verify it against the source text, and flag errors. This audit trail is impossible with vector embeddings and difficult with knowledge graph triples.

8. Limitations

We are honest about what Knowledge Compile cannot do:

  1. Semantic similarity search. If a user asks a question using vocabulary not present in the corpus, grep will not find matches. This is mitigated by the LLM's keyword expansion (generating synonyms before searching), but edge cases exist.

  2. Cross-lingual retrieval. A Chinese query will not match English compiled files. The system handles this through bilingual FAQ generation and domain-separated search, but true cross-lingual retrieval requires embedding-based approaches.

  3. Large file truncation. Files exceeding 150KB are truncated before compilation, potentially losing content from the tail of very long texts. Incremental chunked compilation is planned but not yet implemented.

  4. Single-author limitation. Cross-book synthesis requires an author to have two or more compiled works. Single-book authors receive concept and FAQ compilation but no cross-reference analysis.

  5. Corpus scale. At 192 files and 77MB, the corpus is well within grep's performance envelope. At 10,000+ files or 10GB+, grep latency would increase, and a compiled index or vector fallback layer would become necessary.

9. Related Work

OpenTCM (Chen et al., 2025) constructs a GraphRAG system from 68 TCM texts with 48,000 entities and 152,000 relationships. Their approach achieves 98.55% expert-rated retrieval accuracy but requires significant infrastructure and entity extraction pipelines. Knowledge Compile achieves comparable results for a bounded domain at 1/10th the cost and complexity by delegating relationship understanding to the LLM at query time rather than pre-computing it.

Grep is All You Need (LocalKin Team, 2026) establishes the grep-based retrieval foundation that Knowledge Compile builds upon. Where that work showed retrieval could be simple, this work shows that pre-query knowledge structuring further amplifies the approach.

LightRAG (Guo et al., 2024) proposes lightweight alternatives to full RAG pipelines. Knowledge Compile shares the philosophy of minimizing infrastructure but takes it further by eliminating the retrieval algorithm entirely.

Focused Chain-of-Thought (arXiv 2511.22176) separates information extraction from reasoning in LLM prompts. Knowledge Compile applies this principle at the corpus level: extraction happens once during compilation, reasoning happens at query time with pre-extracted knowledge.

10. Conclusion

Knowledge Compile demonstrates that the gap between raw text and LLM-ready knowledge can be bridged without databases, embeddings, or graphs. By treating the LLM as a one-time knowledge curator rather than a repeated retrieval engine, we achieve structured knowledge extraction across 192 texts in four domains at a total cost of $36, with zero infrastructure dependencies and human-auditable output.

The system has been running autonomously since March 2026, compiling 3--5 books per day without human intervention. At current rates, the full 192-book corpus will be compiled by late May 2026, covering 1,800 years of Traditional Chinese Medicine and Christian spiritual literature in two languages.

The broader lesson is architectural: in a system where an LLM is the ultimate consumer of retrieved knowledge, the retrieval layer should be as simple as possible (grep), and the structuring should happen once (compilation) rather than on every query (inference). Complexity should be added only when simplicity demonstrably fails---and for domain-specific corpora with predictable vocabulary, simplicity has not failed yet.

Appendix A: Compilation Output Examples

A.1 Concept Extraction (Sun Simiao — Essential Prescriptions)

# Essential Prescriptions — Core Concepts

## Thesis
Sun Simiao's 千金方 represents the first systematic attempt to organize
clinical medicine by department, integrating Daoist health cultivation
with empirical pharmacology.

## Core Concepts (7)
- **Great Physician Sincerity (大医精诚)**: Medical ethics framework
  requiring compassion regardless of patient status
- **Departmental Medicine**: Organization by clinical specialty
  (gynecology, pediatrics, external medicine) — revolutionary for 7th century
- **Food as Medicine (食治)**: Dedicated dietary therapy chapters preceding
  pharmacological intervention
...

A.2 FAQ Generation (Brother Lawrence — Practice of the Presence of God)

# Practice of the Presence of God — FAQ

## Q1: How do I start practicing the presence of God in daily life?
A: Begin with short, frequent acts of turning your attention to God
throughout the day. Lawrence emphasizes that this is not about long
prayers but brief moments of awareness — while cooking, walking, or
working. Start with every hour, then gradually make it continuous.

## Q2: What do I do when my mind wanders during practice?
A: Lawrence advises gentle redirection without self-punishment.
Wandering is natural; the practice is in the returning, not in
perfect concentration. He spent 10 years struggling before the
practice became habitual.
...

A.3 Cross-Book Index (Zhang Zhongjing)

# Zhang Zhongjing — Cross-Book Index

## Recurring Themes
- **Six-Stage Pattern Differentiation**: Central framework in both
  Shang Han Lun and Jin Gui Yao Lue, applied to cold damage and
  miscellaneous diseases respectively
- **Formula Precision**: Exact dosages and preparation methods
  emphasized across all works — "one qian more or less changes the formula"

## Apparent Tensions
- Shang Han Lun focuses on acute cold damage (external pathogen);
  Jin Gui Yao Lue addresses chronic internal diseases — different
  treatment philosophies for different disease categories
...

Appendix B: System Integration

B.1 Agent Architecture

Knowledge Compile serves 21 specialized agents within the LocalKin swarm:

Each agent accesses compiled knowledge through the knowledge_search skill, which performs grep across both raw and compiled files.

B.2 Autonomous Growth Pipeline

Daily Scheduled Task (knowledge-growth)
    │
    ├── Query: status across 4 domains
    ├── Select: lowest-coverage domain
    ├── Compile: 3-5 uncompiled files
    ├── Aggregate: cross-book index if author complete
    └── Report: output/knowledge_growth/{date}.md

B.3 Cost Projection

PhaseBooksDurationCost
Current (April 2026)10/192Complete$2
Phase 2 (May 2026)192/192~47 days$34
Steady state (2027+)+300/yearContinuous$60/year

The LocalKin Team builds self-evolving AI agent swarms. More at https://localkin.dev

知识编译:无需数据库、嵌入或图谱的增量式 LLM 知识提取

作者: The LocalKin Team

系统: LocalKin (https://localkin.dev)

日期: 2026 年 4 月

摘要

我们提出 Knowledge Compile(知识编译),一种增量式知识提取系统,无需向量数据库、嵌入模型或知识图谱,即可将原始文本语料转化为结构化的、LLM 可直接使用的知识。系统对每份源文本仅进行两次定向 LLM 调用——一次提取核心概念,一次生成常见问答——产出人类可读的 Markdown 文件,作为结构化检索层。该系统作为 75 智能体 AI 蜂群的一部分在生产环境中运行,覆盖 192 份源文本、四个领域(中医中英文、基督教灵修中英文),总计 77MB 原始文献,时间跨度从公元 200 年至 20 世纪。系统通过定时任务以每天 3-5 本的速度自主编译,整个语料库的总成本约 36 美元。编译产出——第二层概念、FAQ 问答对及跨书综合索引——将每次查询的 token 消耗降低 10-50 倍,同时保持底层 grep 搜索 100% 的检索准确率。与需要实体抽取流水线(古典中文准确率不足 70%)的 GraphRAG 方案不同,知识编译产出人类可审计的输出,且零基础设施依赖。核心洞察:LLM 是优秀的一次性知识策展人,但昂贵的重复检索引擎——编译一次,grep 永远。

关键词: 知识提取、知识编译、检索增强生成、增量处理、领域特定 AI、中医药、数字人文

1. 引言

当前将 LLM 智能体植根于领域知识的主流方法遵循一个共同模式:将语料预处理为机器优化的表示(嵌入、图谱三元组或索引块),然后在推理时查询该表示。向量 RAG 将文档嵌入高维空间。GraphRAG 将实体和关系提取到知识图谱中。两者都需要大量基础设施,引入预处理误差,且产出对人类不透明的表示。

我们提出一种更简单的替代方案:将 LLM 本身用作一次性知识策展人,而非重复检索引擎。给定一份源文本,我们恰好进行两次 LLM 调用——一次提取核心概念,一次生成 FAQ 问答——并将结果写成纯 Markdown 文件存放在原文旁边。这些编译文件成为结构化检索层:当智能体需要知识时,grep 同时搜索原始文本和编译摘要,为 LLM 提供预结构化的上下文,而非原始段落。

这一方案——我们称之为 Knowledge Compile——作为 LocalKin(一个 75 智能体 AI 蜂群)的一部分部署在生产环境中。它处理四个知识领域的 192 份源文本,通过每日定时任务自主运行,自 2026 年 3 月以来持续运行。

本文的贡献不在于"总结文档"这个想法(那是显而易见的),而在于使其在规模上可行的具体架构:增量编译(跳过已处理的文件)、跨书综合(识别同一作者不同作品间的主题与矛盾)、自主调度(每天 3-5 本,无需人工干预)、以及与 grep 检索的无缝集成。最终得到的是一个零基础设施依赖、人类可读输出、192 本书总成本仅 36 美元的知识管理系统。

2. 现有方案的问题

2.1 向量 RAG:预处理的脆弱性

标准 RAG 流水线将文档分块,使用 BGE-M3 或 OpenAI text-embedding-ada-002 等模型嵌入分块,存储到向量数据库(Pinecone、ChromaDB、Qdrant),并在查询时进行近似最近邻搜索。这条流水线引入了多个故障模式:

2.2 GraphRAG:实体抽取瓶颈

GraphRAG 方案(如 Microsoft GraphRAG、OpenTCM)从源文本构建知识图谱,然后遍历图谱回答查询。OpenTCM 报告从 68 本妇科文献中提取了 48,000 个实体和 152,000 条关系,检索准确率达 98.55%。

这些结果令人印象深刻,但掩盖了一个关键瓶颈:实体抽取。命名实体识别(NER)模型在古典中文医学文本上的准确率不足 70%,因为实体是上下文相关的(同一个字根据前后文可能表示草药、症状或解剖部位)。手动构建图谱每本书需要 10-30 小时。对于跨两种语言、四个领域的 192 本书语料库,GraphRAG 的启动成本过于高昂。

2.3 缺失的中间地带

两种方案都在解决一个可能不需要解决的问题。如果 LLM 是检索知识的最终消费者,为什么不为 LLM 而非检索算法来结构化知识?一份结构良好的 Markdown 摘要,对 LLM 来说比一堆嵌入相似的块或一个实体三元组子图更有用。

3. 架构

3.1 三层知识设计

知识编译实现三层架构,每层服务于不同目的:

第一层(原始源文本)。 .txt.md 格式的原始文本,存储在 input/{domain}/{author}/。这些文件永不修改。它们作为事实真相,并可通过 grep 直接搜索。当前语料库:192 个文件,77MB,跨越从公元 200 年(伤寒论)到 20 世纪(灵心小史)的文本。

第二层(编译知识)。 对每个源文件,通过一次 LLM 编译生成两个伴随文件:

这些文件存储在原文旁边的 {author}/_compiled/ 中。

第 2.5 层(跨书索引)。 当一位作者有两本或以上编译完成的书时,聚合生成 {author}/_index.md,包含:

3.2 编译流水线

编译流水线(compile.py,427 行 Python)实现五项操作:

  1. list_needed:扫描某领域中未编译的源文件。当 {stem}_concepts.md{stem}_faq.md 都存在时,视为已编译。
  2. compile:处理单个源文件。读取文本(150KB LLM 上下文限制),通过检测 CJK Unicode 字符自动检测语言,生成语言对应的提示词,向 Claude Haiku 发起两次顺序 API 调用(概念和 FAQ 各一次),写入输出文件。
  3. compile_author:顺序编译某作者所有未编译的文件。
  4. aggregate:为某作者生成跨书 _index.md。读取所有已编译的概念文件,带书名标题连接(80KB 限制),发送给 LLM 进行综合。
  5. status:报告各领域的编译覆盖率。

3.3 与 Grep 检索的集成

知识编译不替代我们早期工作(Grep is All You Need,LocalKin Team,2026)中描述的检索系统,而是增强它。现有的 knowledge_search 技能执行:

  1. grep -r -i -C 8 搜索所有源文件和编译文件
  2. 回退:cat 输出 50KB 以下的 *_concepts.md*_faq.md*_index.md 文件

当中医智能体查询"黄芪的配伍禁忌"时,grep 在原始古典中文源文本和编译后的 FAQ 中都能找到匹配。智能体同时获得预结构化知识和原始段落,无需额外推理成本即可实现更准确的推理。

4. 语料库

4.1 领域覆盖

领域文件数大小作者数语言时间跨度
中医中文9051 MB12古典/现代中文公元 200 年 - 1800 年
灵修中文7214 MB9古典/现代中文公元 400 年 - 1900 年
灵修英文237.2 MB10英文1400 年 - 1900 年
中医英文74.2 MB5英文(译本)公元 200 年 - 1600 年
合计19277 MB36双语1,800 年

4.2 代表性文献

4.3 编译状态(2026 年 4 月)

领域已编译总数覆盖率
中医英文5771%
灵修英文42317%
灵修中文1721%
中医中文0900%
合计101925%

按当前每天 3-5 本的速度,完整语料库将在约 47 天内编译完成。

5. 自主运行

5.1 定时编译

知识编译作为 LocalKin 蜂群中的每日定时任务(knowledge-growth)运行。每天,任务:

  1. 查询全部四个领域的编译状态
  2. 识别覆盖率最低的领域
  3. 选择 3-5 个未编译文件(优先选择最小的文件以确保可靠性)
  4. 以增量方式运行编译(已编译文件不会被重新处理)
  5. 为所有作品已编译完成的作者生成跨书索引
  6. 将每日报告写入 output/knowledge_growth/{date}.md

5.2 零接触增长

添加新知识仅需一个操作:将 .txt.md 文件放入相应的 input/{domain}/{author}/ 目录。定时任务在下次运行时自动发现并编译它。无需重建索引、重新嵌入、模式迁移或图谱重建。这是相对于向量 RAG(需要重新嵌入)和 GraphRAG(需要重新抽取实体和更新图谱)的根本优势。

5.3 成本模型

指标数值
LLM 模型Claude Haiku 4.5
每文件调用次数2(概念 + FAQ)
每次调用最大 token2,000
每文件成本~$0.15-$0.20
每位作者成本(4-5 本)~$0.75
每日成本(4 本/天)~$0.75
项目总成本(192 本)~$36

对比 GraphRAG——需要 LLM 调用进行实体抽取、关系分类和社区摘要——通常每个文档需要 10-50 倍的 LLM 调用。

6. 评估

6.1 Token 效率

知识编译的主要指标是每次智能体查询的 token 减少量。当智能体需要某主题的知识时,可以引用编译后的概念(1.5-3.5 KB)而非原始源文本(50-250 KB)。

来源原始大小编译大小压缩比
伤寒论(英文)89 KB5.1 KB17x
与神同在42 KB4.2 KB10x
本草纲目(英文)156 KB5.1 KB31x
甲乙经(英文)78 KB5.8 KB13x
平均91 KB5.1 KB18x

在数千次智能体查询中,这 18 倍的 token 压缩直接转化为成本节省和更快的响应时间。

6.2 检索质量

知识编译不改变检索准确率——grep 仍然返回 100% 的关键词匹配。它改变的是检索质量:智能体获得结构化概念和 FAQ 问答对以及原始文本段落,实现更聚焦的推理。

对 21 个生产环境智能体的定性评估显示:

6.3 与替代方案的对比

维度知识编译向量 RAGGraphRAG (OpenTCM)
预处理时间~90 秒/文件数小时(分块 + 嵌入)10-30 分钟/本(实体抽取)
基础设施向量数据库图数据库 + 嵌入 API
古典中文准确率人类可审计输出<85%(嵌入失配)<70% NER 准确率
成本(192 本)$36$50-200$500+
添加新文档放入文件,等待下次定时运行重新嵌入、重建索引重新抽取实体、重建图谱
输出格式人类可读 Markdown不透明向量实体三元组
跨书综合自动(_index.md不支持社区检测(自动)
维护负担数据库运维图谱一致性检查
代码行数426 (Python) + 90 (Shell)300-500+1,000+

7. 为什么有效

7.1 LLM 是更好的策展人而非检索器

知识编译背后的根本洞察是劳动分工:在 LLM 擅长的地方使用 LLM(理解、总结、结构化),在简单工具足够的地方使用简单工具(关键词匹配、文件拼接)。

一个 LLM 阅读伤寒论可以识别出张仲景的六经框架是核心组织原则,特定方剂映射到特定阶段,以及文本在 1,800 年后仍具有临床价值。这是一项受益于 LLM 广泛训练的策展任务。在每次查询时要求同一个 LLM 执行此策展是浪费的——知识在查询之间不会改变。

知识编译执行一次昂贵的策展,并将结果存储为零成本检索的格式。

7.2 领域特定词汇是可预测的

中医和基督教灵修文本都使用高度标准化的词汇。麻黄永远指麻黄。"灵魂的暗夜"永远指十字若望的框架。这种可预测性意味着 grep 对领域相关查询达到 100% 召回率——不存在需要语义搜索的同义词或改述。

7.3 人类可读性是特性

知识编译的每个输出都是人类可以阅读、验证和修正的 Markdown 文件。这不是附带的——这是设计要求。当中医智能体基于编译知识提供建议时,执业者可以将建议追溯到特定概念文件,对照源文本验证,并标记错误。这种审计追踪在向量嵌入中不可能,在知识图谱三元组中也很困难。

8. 局限性

我们坦诚知识编译的不足:

  1. 语义相似性搜索。 如果用户使用语料库中不存在的词汇提问,grep 无法找到匹配。这通过 LLM 的关键词扩展(搜索前生成同义词)来缓解,但边缘情况仍然存在。

  2. 跨语言检索。 中文查询不会匹配英文编译文件。系统通过双语 FAQ 生成和按领域分离搜索来处理,但真正的跨语言检索需要基于嵌入的方案。

  3. 大文件截断。 超过 150KB 的文件在编译前会被截断,可能丢失长文本尾部的内容。增量分块编译已计划但尚未实现。

  4. 单一作者限制。 跨书综合要求一位作者有两本或以上已编译的作品。单本书作者获得概念和 FAQ 编译,但没有交叉分析。

  5. 语料库规模。 在 192 个文件和 77MB 的规模下,语料库完全在 grep 的性能范围内。在 10,000+ 文件或 10GB+ 时,grep 延迟会增加,需要编译索引或向量回退层。

9. 相关工作

OpenTCM(Chen 等,2025)从 68 本中医文献构建 GraphRAG 系统,含 48,000 个实体和 152,000 条关系。其方案达到 98.55% 的专家评定检索准确率,但需要大量基础设施和实体抽取流水线。知识编译通过将关系理解委托给查询时的 LLM(而非预计算),以 1/10 的成本和复杂度在有界领域中达到可比结果。

Grep is All You Need(LocalKin Team,2026)建立了知识编译所依托的 grep 检索基础。前者展示了检索可以简单,本文展示查询前的知识结构化进一步放大了这一方案。

LightRAG(Guo 等,2024)提出全 RAG 流水线的轻量替代。知识编译分享最小化基础设施的哲学,但更进一步——完全消除了检索算法。

Focused Chain-of-Thought(arXiv 2511.22176)在 LLM 提示中分离信息提取和推理。知识编译在语料库层面应用这一原则:提取在编译时发生一次,推理在查询时使用预提取的知识进行。

10. 结论

知识编译证明了原始文本与 LLM 可用知识之间的鸿沟可以在没有数据库、嵌入或图谱的情况下弥合。通过将 LLM 视为一次性知识策展人而非重复检索引擎,我们以 36 美元的总成本在四个领域 192 份文本中实现了结构化知识提取,零基础设施依赖,输出人类可审计。

系统自 2026 年 3 月以来自主运行,每天编译 3-5 本书,无需人工干预。按当前速度,完整的 192 本语料库将于 2026 年 5 月底编译完成,覆盖 1,800 年的中医药和基督教灵修文献,跨两种语言。

更广泛的教训是架构性的:在 LLM 是检索知识的最终消费者的系统中,检索层应尽可能简单(grep),结构化应发生一次(编译)而非每次查询(推理)。只有当简单方案可证明地失败时,才应添加复杂性——而对于词汇可预测的领域特定语料库,简单方案尚未失败。

The LocalKin Team 构建自进化 AI 智能体蜂群。更多信息请访问 https://localkin.dev