7The LocalKin Team · April 2026 · v1.0

Knowledge Compile: Incremental LLM-Powered Knowledge Extraction Without Databases, Embeddings, or Graphs

Authors: The LocalKin Team

System: LocalKin (https://localkin.dev)

Date: April 2026

Abstract

We present Knowledge Compile, an incremental knowledge extraction system that converts raw text corpora into structured, LLM-ready knowledge without vector databases, embedding models, or knowledge graphs. The system processes source texts through two targeted LLM calls per document---one for concept extraction, one for FAQ generation---producing human-readable Markdown files that serve as a structured retrieval layer. Deployed in production as part of a 75-agent AI swarm, Knowledge Compile operates across 192 source texts spanning four domains (Traditional Chinese Medicine in Chinese and English, Christian spiritual classics in Chinese and English), totaling 77MB of primary sources dating from 200 CE to the 20th century. The system compiles autonomously at a rate of 3--5 books per day via scheduled task, with a total project cost of approximately $36 for the entire corpus. The compiled output---Layer 2 concepts, FAQ pairs, and cross-book synthesis indexes---reduces per-query token consumption by 10--50x compared to injecting raw source text, while maintaining the 100% retrieval accuracy of the underlying grep-based search. Unlike GraphRAG approaches that require entity extraction pipelines with <70% accuracy on Classical Chinese, Knowledge Compile produces human-auditable output with zero infrastructure dependencies. The key insight: LLMs are excellent one-time knowledge curators but expensive repeated retrieval engines---compile once, grep forever.

Keywords: knowledge extraction, knowledge compilation, retrieval-augmented generation, incremental processing, domain-specific AI, Traditional Chinese Medicine, digital humanities

1. Introduction

The dominant approaches to grounding LLM agents in domain-specific knowledge follow a common pattern: preprocess the corpus into a machine-optimized representation (embeddings, graph triples, or indexed chunks), then query that representation at inference time. Vector RAG embeds documents into high-dimensional space. GraphRAG extracts entities and relationships into knowledge graphs. Both require significant infrastructure, introduce preprocessing errors, and produce representations that are opaque to human inspection.

We propose a simpler alternative: use the LLM itself as a one-time knowledge curator, not a repeated retrieval engine. Given a source text, we make exactly two LLM calls---one to extract core concepts, one to generate FAQ pairs---and write the results as plain Markdown files alongside the originals. These compiled files become a structured retrieval layer: when an agent needs knowledge, grep searches both the raw source and the compiled summaries, providing the LLM with pre-structured context rather than raw passages.

This approach, which we call Knowledge Compile, is deployed in production as part of LocalKin, a 75-agent AI swarm. It processes 192 source texts across four knowledge domains, operates autonomously via daily scheduled tasks, and has been running continuously since March 2026.

The contribution is not the idea of summarizing documents---that is straightforward. The contribution is the specific architecture that makes this practical at scale: incremental compilation (skip already-processed files), cross-book synthesis (identify themes and contradictions across an author's works), autonomous scheduling (3--5 books per day, no human intervention), and seamless integration with grep-based retrieval. The result is a knowledge management system with zero infrastructure dependencies, human-readable output, and a total cost of $36 for a 192-book corpus.

2. The Problem with Existing Approaches

2.1 Vector RAG: Preprocessing Fragility

Standard RAG pipelines chunk documents, embed chunks using models like BGE-M3 or OpenAI's text-embedding-ada-002, store embeddings in vector databases (Pinecone, ChromaDB, Qdrant), and perform approximate nearest-neighbor search at query time. This pipeline introduces several failure modes:

●Chunk boundary artifacts. Semantic meaning that spans chunk boundaries is lost. A TCM prescription split across two chunks may never be retrieved as a complete unit.
●Embedding model mismatch. Models trained on English web text perform poorly on Classical Chinese medical terminology. The herb 黄芪 (Astragalus) may embed near 人參 (Ginseng) due to shared semantic fields, despite being clinically distinct.
●Index staleness. Adding new documents requires re-embedding and re-indexing. For a growing corpus, this creates a maintenance burden.
●Infrastructure cost. Vector databases require hosting, monitoring, and backup. For a solo developer running agents on a Mac Mini, this is unacceptable overhead.

2.2 GraphRAG: Entity Extraction Bottleneck

GraphRAG approaches (e.g., Microsoft GraphRAG, OpenTCM) construct knowledge graphs from source texts, then traverse the graph to answer queries. OpenTCM reports 48,000 entities and 152,000 relationships extracted from 68 gynecological texts, achieving 98.55% retrieval accuracy.

These results are impressive but obscure a critical bottleneck: entity extraction. Named Entity Recognition (NER) models achieve <70% accuracy on Classical Chinese medical texts, where entities are context-dependent (the same character can denote an herb, a symptom, or an anatomical location depending on surrounding text). Manual graph construction takes 10--30 hours per book. For a corpus of 192 books across two languages and four domains, GraphRAG is prohibitively expensive to bootstrap.

2.3 The Missing Middle Ground

Both approaches solve a problem that may not need solving. If the LLM is the ultimate consumer of retrieved knowledge, why not structure the knowledge for the LLM rather than for a retrieval algorithm? A well-structured Markdown summary is more useful to an LLM than a bag of embedding-similar chunks or a subgraph of entity triples.

3. Architecture

3.1 Three-Layer Knowledge Design

Knowledge Compile implements a three-layer architecture where each layer serves a distinct purpose:

Layer 1 (Raw Source). Original texts in .txt and .md format, stored in input/{domain}/{author}/. These files are never modified. They serve as the ground truth and remain directly searchable via grep. Current corpus: 192 files, 77MB, spanning texts from 200 CE (Shang Han Lun) to the 20th century (Story of a Soul).

Layer 2 (Compiled Knowledge). For each source file, two companion files are generated by a single LLM compilation pass:

●{title}_concepts.md: 5--10 core concepts with definitions, key quotations with chapter references, and practical application points. Target: <3,000 characters.
●{title}_faq.md: 5--8 question-answer pairs addressing the problems a reader would actually have. Target: <3,000 characters.

These files are stored in {author}/_compiled/ alongside the originals.

Layer 2.5 (Cross-Source Index). When an author has two or more compiled books, an aggregation pass generates {author}/_index.md containing:

●Recurring themes across books with per-book perspective differences
●Concepts unique to each book
●Apparent contradictions or tensions between works
●Recommended reading path (introductory → advanced → essential)

3.2 Compilation Pipeline

The compilation pipeline (compile.py, 426 lines of Python) implements five operations:

●
list_needed: Scan a domain for uncompiled source files. A file is considered compiled if both {stem}_concepts.md and {stem}_faq.md exist.
●
compile: Process a single source file. Read the text (150KB limit for LLM context), auto-detect language by checking for CJK Unicode characters, generate a language-appropriate prompt, make two sequential API calls to Claude Haiku (one for concepts, one for FAQ), write output files.
●
compile_author: Compile all uncompiled files for a given author, sequentially.
●
aggregate: Generate the cross-book _index.md for an author. Reads all compiled concept files, concatenates with book headers (80KB limit), sends to LLM for synthesis.
●
status: Report compilation coverage per domain.

3.3 Integration with Grep-Based Retrieval

Knowledge Compile does not replace the retrieval system described in our earlier work (Grep is All You Need, LocalKin Team, 2026). It augments it. The existing knowledge_search skill performs:

●grep -r -i -C 8 across all source files and compiled files
●Fallback: cat of *_concepts.md, *_faq.md, and *_index.md files under 50KB

When a TCM agent queries "黄芪的配伍禁忌" (Astragalus compatibility contraindications), grep finds matches in both the raw Classical Chinese source and the compiled FAQ. The agent receives pre-structured knowledge alongside raw passages, enabling more accurate reasoning without additional inference cost.

4. Corpus

4.1 Domain Coverage

Domain	Files	Size	Authors	Language	Time Span
tcm_zh	90	51 MB	12	Classical/Modern Chinese	200 CE -- 1800 CE
spiritual_zh	72	14 MB	9	Classical/Modern Chinese	400 CE -- 1900 CE
spiritual_en	23	7.2 MB	10	English	1400 CE -- 1900 CE
tcm_en	7	4.2 MB	5	English (translations)	200 CE -- 1600 CE
Total	192	77 MB	37	2 languages	1,800 years

4.2 Notable Sources

●Shang Han Lun (傷寒論, Zhang Zhongjing, ~200 CE): The foundational text of TCM pattern differentiation. Six-stage cold damage framework still used in clinical practice.
●Ben Cao Gang Mu (本草綱目, Li Shizhen, 1578): 1,892 substances catalogued. The most comprehensive pre-modern pharmacopoeia.
●The Cloud of Unknowing (Anonymous, ~1370): A guide to contemplative prayer that influenced centuries of Western mysticism.
●千金方 (Sun Simiao, ~652 CE): "Prescriptions Worth a Thousand Gold," a comprehensive medical encyclopedia.

4.3 Compilation Status (April 2026)

Domain	Compiled	Total	Coverage
tcm_en	5	7	71%
spiritual_en	4	23	17%
spiritual_zh	1	72	1%
tcm_zh	0	90	0%
Total	10	192	5%

At the current rate of 3--5 books per day, the full corpus will be compiled in approximately 47 days.

5. Autonomous Operation

5.1 Scheduled Compilation

Knowledge Compile runs as a daily scheduled task (knowledge-growth) within the LocalKin swarm. Each day, the task:

●Queries compilation status across all four domains
●Identifies the domain with lowest coverage
●Selects 3--5 uncompiled files (smallest first for reliability)
●Runs compilation with incremental skipping (already-compiled files are not reprocessed)
●Generates cross-book indexes for authors whose complete works are now compiled
●Writes a daily report to output/knowledge_growth/{date}.md

5.2 Zero-Touch Growth

Adding new knowledge requires exactly one action: place a .txt or .md file in the appropriate input/{domain}/{author}/ directory. The scheduled task automatically discovers and compiles it on the next run. No re-indexing, no re-embedding, no schema migration, no graph rebuilding. This is the fundamental advantage over both Vector RAG (which requires re-embedding) and GraphRAG (which requires entity re-extraction and graph updating).

5.3 Cost Model

Metric	Value
LLM model	Claude Haiku 4.5
Calls per file	2 (concepts + FAQ)
Max tokens per call	2,000
Cost per file	~$0.15--$0.20
Cost per author (4--5 books)	~$0.75
Daily cost (4 books/day)	~$0.75
Total project cost (192 books)	~$36

Compare this to GraphRAG, which requires LLM calls for entity extraction, relationship classification, and community summarization---typically 10--50x more LLM calls per document.

6. Evaluation

6.1 Token Efficiency

The primary metric for Knowledge Compile is token reduction per agent query. When an agent needs knowledge about a topic, it can reference compiled concepts (1.5--3.5 KB) instead of raw source text (50--250 KB).

Source	Raw Size	Compiled Size	Reduction
Shang Han Lun (EN)	89 KB	5.1 KB	17x
Practice of Presence of God	42 KB	4.2 KB	10x
Ben Cao Gang Mu (EN)	156 KB	5.1 KB	31x
Jia Yi Jing (EN)	78 KB	5.8 KB	13x
Average	91 KB	5.1 KB	18x

Over thousands of agent queries, this 18x token reduction translates directly to cost savings and faster response times.

6.2 Retrieval Quality

Knowledge Compile does not change retrieval accuracy---grep still returns 100% of keyword matches. What it changes is retrieval quality: the agent receives structured concepts and FAQ pairs alongside raw text passages, enabling more focused reasoning.

Qualitative assessment across 21 production agents shows:

●TCM agents cite specific concepts from compiled files rather than paraphrasing raw Classical Chinese
●Spiritual direction agents reference cross-book themes from _index.md files
●FAQ pairs catch common user questions that raw text search would miss (e.g., "How long does it take to develop the practice?" --- a question answered in the FAQ but not explicitly stated in the source text)

6.3 Comparison with Alternative Approaches

Dimension	Knowledge Compile	Vector RAG	GraphRAG (OpenTCM)
Preprocessing time	~90s per file	Hours (chunking + embedding)	10--30 min/book (entity extraction)
Infrastructure	None	Vector database	Graph database + embedding API
Accuracy on Classical Chinese	Human-auditable output	<85% (embedding mismatch)	<70% NER accuracy
Cost (192 books)	$36	$50--200 (embedding calls + hosting)	$500+ (LLM extraction + hosting)
Adding new documents	Drop file, wait for next scheduled run	Re-embed, re-index	Re-extract entities, rebuild graph
Output format	Human-readable Markdown	Opaque vectors	Entity triples
Cross-book synthesis	Automatic (`_index.md`)	Not supported	Community detection (automated)
Maintenance burden	Zero	Database operations	Graph consistency checks
Lines of code	426 (Python) + 90 (Shell)	300--500+	1,000+

7. Why This Works

7.1 LLMs Are Better Curators Than Retrievers

The fundamental insight behind Knowledge Compile is a division of labor: use LLMs where they excel (understanding, summarizing, structuring) and use simple tools where they suffice (keyword matching, file concatenation).

An LLM reading the Shang Han Lun can identify that Zhang Zhongjing's six-stage framework is the core organizational principle, that specific prescriptions map to specific stages, and that the text's clinical relevance persists after 1,800 years. This is a curation task that benefits from the LLM's broad training. Asking the same LLM to perform this curation on every query is wasteful---the knowledge doesn't change between queries.

Knowledge Compile performs the expensive curation once and stores the result in a format that costs nothing to retrieve.

7.2 Domain-Specific Vocabulary Is Predictable

Both TCM and Christian spiritual texts use highly standardized vocabularies. 麻黄 always refers to Ephedra. "Dark night of the soul" always refers to John of the Cross's framework. This predictability means grep achieves 100% recall for domain-relevant queries---there are no synonyms or paraphrases that would require semantic search.

7.3 Human Readability Is a Feature

Every output of Knowledge Compile is a Markdown file that a human can read, verify, and correct. This is not incidental---it is a design requirement. When a TCM agent provides advice based on compiled knowledge, a practitioner can trace the recommendation back to a specific concept file, verify it against the source text, and flag errors. This audit trail is impossible with vector embeddings and difficult with knowledge graph triples.

8. Limitations

We are honest about what Knowledge Compile cannot do:

●
Semantic similarity search. If a user asks a question using vocabulary not present in the corpus, grep will not find matches. This is mitigated by the LLM's keyword expansion (generating synonyms before searching), but edge cases exist.
●
Cross-lingual retrieval. A Chinese query will not match English compiled files. The system handles this through bilingual FAQ generation and domain-separated search, but true cross-lingual retrieval requires embedding-based approaches.
●
Large file truncation. Files exceeding 150KB are truncated before compilation, potentially losing content from the tail of very long texts. Incremental chunked compilation is planned but not yet implemented.
●
Single-author limitation. Cross-book synthesis requires an author to have two or more compiled works. Single-book authors receive concept and FAQ compilation but no cross-reference analysis.
●
Corpus scale. At 192 files and 77MB, the corpus is well within grep's performance envelope. At 10,000+ files or 10GB+, grep latency would increase, and a compiled index or vector fallback layer would become necessary.

9. Related Work

OpenTCM (Chen et al., 2025) constructs a GraphRAG system from 68 TCM texts with 48,000 entities and 152,000 relationships. Their approach achieves 98.55% expert-rated retrieval accuracy but requires significant infrastructure and entity extraction pipelines. Knowledge Compile achieves comparable results for a bounded domain at 1/10th the cost and complexity by delegating relationship understanding to the LLM at query time rather than pre-computing it.

Grep is All You Need (LocalKin Team, 2026) establishes the grep-based retrieval foundation that Knowledge Compile builds upon. Where that work showed retrieval could be simple, this work shows that pre-query knowledge structuring further amplifies the approach.

LightRAG (Guo et al., 2024) proposes lightweight alternatives to full RAG pipelines. Knowledge Compile shares the philosophy of minimizing infrastructure but takes it further by eliminating the retrieval algorithm entirely.

Focused Chain-of-Thought (arXiv 2511.22176) separates information extraction from reasoning in LLM prompts. Knowledge Compile applies this principle at the corpus level: extraction happens once during compilation, reasoning happens at query time with pre-extracted knowledge.

10. Conclusion

Knowledge Compile demonstrates that the gap between raw text and LLM-ready knowledge can be bridged without databases, embeddings, or graphs. By treating the LLM as a one-time knowledge curator rather than a repeated retrieval engine, we achieve structured knowledge extraction across 192 texts in four domains at a total cost of $36, with zero infrastructure dependencies and human-auditable output.

The system has been running autonomously since March 2026, compiling 3--5 books per day without human intervention. At current rates, the full 192-book corpus will be compiled by late May 2026, covering 1,800 years of Traditional Chinese Medicine and Christian spiritual literature in two languages.

The broader lesson is architectural: in a system where an LLM is the ultimate consumer of retrieved knowledge, the retrieval layer should be as simple as possible (grep), and the structuring should happen once (compilation) rather than on every query (inference). Complexity should be added only when simplicity demonstrably fails---and for domain-specific corpora with predictable vocabulary, simplicity has not failed yet.

Appendix A: Compilation Output Examples

A.1 Concept Extraction (Sun Simiao — Essential Prescriptions)

# Essential Prescriptions — Core Concepts

## Thesis
Sun Simiao's 千金方 represents the first systematic attempt to organize
clinical medicine by department, integrating Daoist health cultivation
with empirical pharmacology.

## Core Concepts (7)
- **Great Physician Sincerity (大医精诚)**: Medical ethics framework
  requiring compassion regardless of patient status
- **Departmental Medicine**: Organization by clinical specialty
  (gynecology, pediatrics, external medicine) — revolutionary for 7th century
- **Food as Medicine (食治)**: Dedicated dietary therapy chapters preceding
  pharmacological intervention
...

A.2 FAQ Generation (Brother Lawrence — Practice of the Presence of God)

# Practice of the Presence of God — FAQ

## Q1: How do I start practicing the presence of God in daily life?
A: Begin with short, frequent acts of turning your attention to God
throughout the day. Lawrence emphasizes that this is not about long
prayers but brief moments of awareness — while cooking, walking, or
working. Start with every hour, then gradually make it continuous.

## Q2: What do I do when my mind wanders during practice?
A: Lawrence advises gentle redirection without self-punishment.
Wandering is natural; the practice is in the returning, not in
perfect concentration. He spent 10 years struggling before the
practice became habitual.
...

A.3 Cross-Book Index (Zhang Zhongjing)

# Zhang Zhongjing — Cross-Book Index

## Recurring Themes
- **Six-Stage Pattern Differentiation**: Central framework in both
  Shang Han Lun and Jin Gui Yao Lue, applied to cold damage and
  miscellaneous diseases respectively
- **Formula Precision**: Exact dosages and preparation methods
  emphasized across all works — "one qian more or less changes the formula"

## Apparent Tensions
- Shang Han Lun focuses on acute cold damage (external pathogen);
  Jin Gui Yao Lue addresses chronic internal diseases — different
  treatment philosophies for different disease categories
...

Appendix B: System Integration

B.1 Agent Architecture

Knowledge Compile serves 21 specialized agents within the LocalKin swarm:

●11 TCM agents (1 conductor + 10 historical physician personas)
●9 spiritual direction agents (1 conductor + 8 mystic/theologian personas)
●1 citizenship coaching agent

Each agent accesses compiled knowledge through the knowledge_search skill, which performs grep across both raw and compiled files.

B.2 Autonomous Growth Pipeline

Daily Scheduled Task (knowledge-growth)
    │
    ├── Query: status across 4 domains
    ├── Select: lowest-coverage domain
    ├── Compile: 3-5 uncompiled files
    ├── Aggregate: cross-book index if author complete
    └── Report: output/knowledge_growth/{date}.md

B.3 Cost Projection

Phase	Books	Duration	Cost
Current (April 2026)	10/192	Complete	$2
Phase 2 (May 2026)	192/192	~47 days	$34
Steady state (2027+)	+300/year	Continuous	$60/year

The LocalKin Team builds self-evolving AI agent swarms. More at https://localkin.dev

知识编译：无需数据库、嵌入或图谱的增量式 LLM 知识提取

作者： The LocalKin Team

系统： LocalKin (https://localkin.dev)

日期： 2026 年 4 月

摘要

我们提出 Knowledge Compile（知识编译），一种增量式知识提取系统，无需向量数据库、嵌入模型或知识图谱，即可将原始文本语料转化为结构化的、LLM 可直接使用的知识。系统对每份源文本仅进行两次定向 LLM 调用——一次提取核心概念，一次生成常见问答——产出人类可读的 Markdown 文件，作为结构化检索层。该系统作为 75 智能体 AI 蜂群的一部分在生产环境中运行，覆盖 192 份源文本、四个领域（中医中英文、基督教灵修中英文），总计 77MB 原始文献，时间跨度从公元 200 年至 20 世纪。系统通过定时任务以每天 3-5 本的速度自主编译，整个语料库的总成本约 36 美元。编译产出——第二层概念、FAQ 问答对及跨书综合索引——将每次查询的 token 消耗降低 10-50 倍，同时保持底层 grep 搜索 100% 的检索准确率。与需要实体抽取流水线（古典中文准确率不足 70%）的 GraphRAG 方案不同，知识编译产出人类可审计的输出，且零基础设施依赖。核心洞察：LLM 是优秀的一次性知识策展人，但昂贵的重复检索引擎——编译一次，grep 永远。

关键词： 知识提取、知识编译、检索增强生成、增量处理、领域特定 AI、中医药、数字人文

1. 引言

当前将 LLM 智能体植根于领域知识的主流方法遵循一个共同模式：将语料预处理为机器优化的表示（嵌入、图谱三元组或索引块），然后在推理时查询该表示。向量 RAG 将文档嵌入高维空间。GraphRAG 将实体和关系提取到知识图谱中。两者都需要大量基础设施，引入预处理误差，且产出对人类不透明的表示。

我们提出一种更简单的替代方案：将 LLM 本身用作一次性知识策展人，而非重复检索引擎。给定一份源文本，我们恰好进行两次 LLM 调用——一次提取核心概念，一次生成 FAQ 问答——并将结果写成纯 Markdown 文件存放在原文旁边。这些编译文件成为结构化检索层：当智能体需要知识时，grep 同时搜索原始文本和编译摘要，为 LLM 提供预结构化的上下文，而非原始段落。

这一方案——我们称之为 Knowledge Compile——作为 LocalKin（一个 75 智能体 AI 蜂群）的一部分部署在生产环境中。它处理四个知识领域的 192 份源文本，通过每日定时任务自主运行，自 2026 年 3 月以来持续运行。

本文的贡献不在于"总结文档"这个想法（那是显而易见的），而在于使其在规模上可行的具体架构：增量编译（跳过已处理的文件）、跨书综合（识别同一作者不同作品间的主题与矛盾）、自主调度（每天 3-5 本，无需人工干预）、以及与 grep 检索的无缝集成。最终得到的是一个零基础设施依赖、人类可读输出、192 本书总成本仅 36 美元的知识管理系统。

2. 现有方案的问题

2.1 向量 RAG：预处理的脆弱性

标准 RAG 流水线将文档分块，使用 BGE-M3 或 OpenAI text-embedding-ada-002 等模型嵌入分块，存储到向量数据库（Pinecone、ChromaDB、Qdrant），并在查询时进行近似最近邻搜索。这条流水线引入了多个故障模式：

●分块边界伪影。 跨越分块边界的语义信息会丢失。一个中医方剂被切分到两个块中，可能永远无法作为完整单元被检索到。
●嵌入模型失配。 在英文网页文本上训练的模型对古典中文医学术语表现差。药材黄芪（Astragalus）可能因共享语义场而嵌入到人参（Ginseng）附近，尽管它们在临床上截然不同。
●索引过期。 添加新文档需要重新嵌入和重建索引。对于不断增长的语料库，这造成持续的维护负担。
●基础设施成本。 向量数据库需要托管、监控和备份。对于在 Mac Mini 上运行智能体的独立开发者来说，这是不可接受的开销。

2.2 GraphRAG：实体抽取瓶颈

GraphRAG 方案（如 Microsoft GraphRAG、OpenTCM）从源文本构建知识图谱，然后遍历图谱回答查询。OpenTCM 报告从 68 本妇科文献中提取了 48,000 个实体和 152,000 条关系，检索准确率达 98.55%。

这些结果令人印象深刻，但掩盖了一个关键瓶颈：实体抽取。命名实体识别（NER）模型在古典中文医学文本上的准确率不足 70%，因为实体是上下文相关的（同一个字根据前后文可能表示草药、症状或解剖部位）。手动构建图谱每本书需要 10-30 小时。对于跨两种语言、四个领域的 192 本书语料库，GraphRAG 的启动成本过于高昂。

2.3 缺失的中间地带

两种方案都在解决一个可能不需要解决的问题。如果 LLM 是检索知识的最终消费者，为什么不为 LLM 而非检索算法来结构化知识？一份结构良好的 Markdown 摘要，对 LLM 来说比一堆嵌入相似的块或一个实体三元组子图更有用。

3. 架构

3.1 三层知识设计

知识编译实现三层架构，每层服务于不同目的：

第一层（原始源文本）。 .txt 和 .md 格式的原始文本，存储在 input/{domain}/{author}/。这些文件永不修改。它们作为事实真相，并可通过 grep 直接搜索。当前语料库：192 个文件，77MB，跨越从公元 200 年（伤寒论）到 20 世纪（灵心小史）的文本。

第二层（编译知识）。 对每个源文件，通过一次 LLM 编译生成两个伴随文件：

●{title}_concepts.md：5-10 个核心概念，附定义、带章节引用的关键引文、实践要点。目标：<3,000 字符。
●{title}_faq.md：5-8 个问答对，解答读者实际会遇到的问题。目标：<3,000 字符。

这些文件存储在原文旁边的 {author}/_compiled/ 中。

第 2.5 层（跨书索引）。 当一位作者有两本或以上编译完成的书时，聚合生成 {author}/_index.md，包含：

●跨书重复出现的主题及各书视角差异
●每本书独有的概念
●不同作品间的明显矛盾或张力
●推荐阅读路径（入门 → 进阶 → 核心）

3.2 编译流水线

编译流水线（compile.py，427 行 Python）实现五项操作：

●list_needed：扫描某领域中未编译的源文件。当 {stem}_concepts.md 和 {stem}_faq.md 都存在时，视为已编译。
●compile：处理单个源文件。读取文本（150KB LLM 上下文限制），通过检测 CJK Unicode 字符自动检测语言，生成语言对应的提示词，向 Claude Haiku 发起两次顺序 API 调用（概念和 FAQ 各一次），写入输出文件。
●compile_author：顺序编译某作者所有未编译的文件。
●aggregate：为某作者生成跨书 _index.md。读取所有已编译的概念文件，带书名标题连接（80KB 限制），发送给 LLM 进行综合。
●status：报告各领域的编译覆盖率。

3.3 与 Grep 检索的集成

知识编译不替代我们早期工作（Grep is All You Need，LocalKin Team，2026）中描述的检索系统，而是增强它。现有的 knowledge_search 技能执行：

●grep -r -i -C 8 搜索所有源文件和编译文件
●回退：cat 输出 50KB 以下的 *_concepts.md、*_faq.md 和 *_index.md 文件

当中医智能体查询"黄芪的配伍禁忌"时，grep 在原始古典中文源文本和编译后的 FAQ 中都能找到匹配。智能体同时获得预结构化知识和原始段落，无需额外推理成本即可实现更准确的推理。

4. 语料库

4.1 领域覆盖

领域	文件数	大小	作者数	语言	时间跨度
中医中文	90	51 MB	12	古典/现代中文	公元 200 年 - 1800 年
灵修中文	72	14 MB	9	古典/现代中文	公元 400 年 - 1900 年
灵修英文	23	7.2 MB	10	英文	1400 年 - 1900 年
中医英文	7	4.2 MB	5	英文（译本）	公元 200 年 - 1600 年
合计	192	77 MB	36	双语	1,800 年

4.2 代表性文献

●伤寒论（张仲景，约公元 200 年）：中医辨证论治的奠基之作。六经辨证框架至今仍在临床使用。
●本草纲目（李时珍，1578 年）：收录 1,892 种药物。最全面的前现代药典。
●未知之云（佚名，约 1370 年）：冥想祈祷指南，影响了数百年的西方神秘主义。
●千金方（孙思邈，约公元 652 年）："千金"之名寓意人命至重，综合性医学百科全书。

4.3 编译状态（2026 年 4 月）

领域	已编译	总数	覆盖率
中医英文	5	7	71%
灵修英文	4	23	17%
灵修中文	1	72	1%
中医中文	0	90	0%
合计	10	192	5%

按当前每天 3-5 本的速度，完整语料库将在约 47 天内编译完成。

5. 自主运行

5.1 定时编译

知识编译作为 LocalKin 蜂群中的每日定时任务（knowledge-growth）运行。每天，任务：

●查询全部四个领域的编译状态
●识别覆盖率最低的领域
●选择 3-5 个未编译文件（优先选择最小的文件以确保可靠性）
●以增量方式运行编译（已编译文件不会被重新处理）
●为所有作品已编译完成的作者生成跨书索引
●将每日报告写入 output/knowledge_growth/{date}.md

5.2 零接触增长

添加新知识仅需一个操作：将 .txt 或 .md 文件放入相应的 input/{domain}/{author}/ 目录。定时任务在下次运行时自动发现并编译它。无需重建索引、重新嵌入、模式迁移或图谱重建。这是相对于向量 RAG（需要重新嵌入）和 GraphRAG（需要重新抽取实体和更新图谱）的根本优势。

5.3 成本模型

指标	数值
LLM 模型	Claude Haiku 4.5
每文件调用次数	2（概念 + FAQ）
每次调用最大 token	2,000
每文件成本	~$0.15-$0.20
每位作者成本（4-5 本）	~$0.75
每日成本（4 本/天）	~$0.75
项目总成本（192 本）	~$36

对比 GraphRAG——需要 LLM 调用进行实体抽取、关系分类和社区摘要——通常每个文档需要 10-50 倍的 LLM 调用。

6. 评估

6.1 Token 效率

知识编译的主要指标是每次智能体查询的 token 减少量。当智能体需要某主题的知识时，可以引用编译后的概念（1.5-3.5 KB）而非原始源文本（50-250 KB）。

来源	原始大小	编译大小	压缩比
伤寒论（英文）	89 KB	5.1 KB	17x
与神同在	42 KB	4.2 KB	10x
本草纲目（英文）	156 KB	5.1 KB	31x
甲乙经（英文）	78 KB	5.8 KB	13x
平均	91 KB	5.1 KB	18x

在数千次智能体查询中，这 18 倍的 token 压缩直接转化为成本节省和更快的响应时间。

6.2 检索质量

知识编译不改变检索准确率——grep 仍然返回 100% 的关键词匹配。它改变的是检索质量：智能体获得结构化概念和 FAQ 问答对以及原始文本段落，实现更聚焦的推理。

对 21 个生产环境智能体的定性评估显示：

●中医智能体引用编译文件中的具体概念，而非复述原始古典中文
●灵修导师智能体引用 _index.md 文件中的跨书主题
●FAQ 问答对捕获了原始文本搜索会遗漏的常见用户问题（例如："培养这种修行需要多长时间？"——一个在 FAQ 中有回答但在源文本中没有明确陈述的问题）

6.3 与替代方案的对比

维度	知识编译	向量 RAG	GraphRAG (OpenTCM)
预处理时间	~90 秒/文件	数小时（分块 + 嵌入）	10-30 分钟/本（实体抽取）
基础设施	无	向量数据库	图数据库 + 嵌入 API
古典中文准确率	人类可审计输出	<85%（嵌入失配）	<70% NER 准确率
成本（192 本）	$36	$50-200	$500+
添加新文档	放入文件，等待下次定时运行	重新嵌入、重建索引	重新抽取实体、重建图谱
输出格式	人类可读 Markdown	不透明向量	实体三元组
跨书综合	自动（`_index.md`）	不支持	社区检测（自动）
维护负担	零	数据库运维	图谱一致性检查
代码行数	426 (Python) + 90 (Shell)	300-500+	1,000+

7. 为什么有效

7.1 LLM 是更好的策展人而非检索器

知识编译背后的根本洞察是劳动分工：在 LLM 擅长的地方使用 LLM（理解、总结、结构化），在简单工具足够的地方使用简单工具（关键词匹配、文件拼接）。

一个 LLM 阅读伤寒论可以识别出张仲景的六经框架是核心组织原则，特定方剂映射到特定阶段，以及文本在 1,800 年后仍具有临床价值。这是一项受益于 LLM 广泛训练的策展任务。在每次查询时要求同一个 LLM 执行此策展是浪费的——知识在查询之间不会改变。

知识编译执行一次昂贵的策展，并将结果存储为零成本检索的格式。

7.2 领域特定词汇是可预测的

中医和基督教灵修文本都使用高度标准化的词汇。麻黄永远指麻黄。"灵魂的暗夜"永远指十字若望的框架。这种可预测性意味着 grep 对领域相关查询达到 100% 召回率——不存在需要语义搜索的同义词或改述。

7.3 人类可读性是特性

知识编译的每个输出都是人类可以阅读、验证和修正的 Markdown 文件。这不是附带的——这是设计要求。当中医智能体基于编译知识提供建议时，执业者可以将建议追溯到特定概念文件，对照源文本验证，并标记错误。这种审计追踪在向量嵌入中不可能，在知识图谱三元组中也很困难。

8. 局限性

我们坦诚知识编译的不足：

●
语义相似性搜索。 如果用户使用语料库中不存在的词汇提问，grep 无法找到匹配。这通过 LLM 的关键词扩展（搜索前生成同义词）来缓解，但边缘情况仍然存在。
●
跨语言检索。 中文查询不会匹配英文编译文件。系统通过双语 FAQ 生成和按领域分离搜索来处理，但真正的跨语言检索需要基于嵌入的方案。
●
大文件截断。 超过 150KB 的文件在编译前会被截断，可能丢失长文本尾部的内容。增量分块编译已计划但尚未实现。
●
单一作者限制。 跨书综合要求一位作者有两本或以上已编译的作品。单本书作者获得概念和 FAQ 编译，但没有交叉分析。
●
语料库规模。 在 192 个文件和 77MB 的规模下，语料库完全在 grep 的性能范围内。在 10,000+ 文件或 10GB+ 时，grep 延迟会增加，需要编译索引或向量回退层。

9. 相关工作

OpenTCM（Chen 等，2025）从 68 本中医文献构建 GraphRAG 系统，含 48,000 个实体和 152,000 条关系。其方案达到 98.55% 的专家评定检索准确率，但需要大量基础设施和实体抽取流水线。知识编译通过将关系理解委托给查询时的 LLM（而非预计算），以 1/10 的成本和复杂度在有界领域中达到可比结果。

Grep is All You Need（LocalKin Team，2026）建立了知识编译所依托的 grep 检索基础。前者展示了检索可以简单，本文展示查询前的知识结构化进一步放大了这一方案。

LightRAG（Guo 等，2024）提出全 RAG 流水线的轻量替代。知识编译分享最小化基础设施的哲学，但更进一步——完全消除了检索算法。

Focused Chain-of-Thought（arXiv 2511.22176）在 LLM 提示中分离信息提取和推理。知识编译在语料库层面应用这一原则：提取在编译时发生一次，推理在查询时使用预提取的知识进行。

10. 结论

知识编译证明了原始文本与 LLM 可用知识之间的鸿沟可以在没有数据库、嵌入或图谱的情况下弥合。通过将 LLM 视为一次性知识策展人而非重复检索引擎，我们以 36 美元的总成本在四个领域 192 份文本中实现了结构化知识提取，零基础设施依赖，输出人类可审计。

系统自 2026 年 3 月以来自主运行，每天编译 3-5 本书，无需人工干预。按当前速度，完整的 192 本语料库将于 2026 年 5 月底编译完成，覆盖 1,800 年的中医药和基督教灵修文献，跨两种语言。

更广泛的教训是架构性的：在 LLM 是检索知识的最终消费者的系统中，检索层应尽可能简单（grep），结构化应发生一次（编译）而非每次查询（推理）。只有当简单方案可证明地失败时，才应添加复杂性——而对于词汇可预测的领域特定语料库，简单方案尚未失败。

The LocalKin Team 构建自进化 AI 智能体蜂群。更多信息请访问 https://localkin.dev

How to cite this paper

Three formats below — pick the one that matches your venue. Each has a one-click copy button.

BibTeX

@misc{localkin2026knowledge,
  author    = {{The LocalKin Team}},
  title     = {Knowledge Compile: Incremental LLM-Powered Knowledge Extraction Without Databases, Embeddings, or Graphs},
  year      = {2026},
  month     = apr,
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20094238},
  url       = {https://doi.org/10.5281/zenodo.20094238},
  note      = {Correspondence: contact@localkin.ai}
}

APA

The LocalKin Team. (2026). Knowledge Compile: Incremental LLM-Powered Knowledge Extraction Without Databases, Embeddings, or Graphs. Zenodo. https://doi.org/10.5281/zenodo.20094238

Chicago

LocalKin Team, The. 2026. "Knowledge Compile: Incremental LLM-Powered Knowledge Extraction Without Databases, Embeddings, or Graphs." Zenodo, April. https://doi.org/10.5281/zenodo.20094238.