Knowledge Layers
Grep-based retrieval. No vector database. No embeddings. 100% source fidelity.
Three Layers
Layer 1: Raw Source (grep)
Original texts in markdown. 192 books across 4 domains (spiritual_zh/en, tcm_zh/en). Agents use grep -C 8 to pull 8 lines of context around keyword matches. Retrieval is deterministic, millisecond-fast, 100% faithful to source.
Layer 2: Compiled Concepts + FAQ
For each source book, an LLM compiles two companion files:
- ●
{book}_concepts.md— 5-10 core concepts, key quotes, practice notes (~1KB) - ●
{book}_faq.md— 5-8 Q&A pairs anticipating reader questions
257KB source → 1KB concepts = 257x compression, core meaning preserved.
Incremental compilation: one file at a time, skips already-compiled files. knowledge-growth runs 3-5 books/day. 192 books finish in ~48 days.
Layer 2.5: Cross-Book Index
When an author has 2+ books compiled, an aggregate step produces _index.md:
- ●Recurring themes across books
- ●Unique contributions per book
- ●Internal contradictions or tensions
- ●Recommended reading path
Single-book authors skip this step (no cross-comparison possible).
Why Not Vector RAG
Vector databases introduce preprocessing overhead, embedding model dependencies, approximate nearest neighbor errors, and black-box retrieval. For domain-specific knowledge where exact source fidelity matters (medical formulas, scripture quotes), grep is both faster and more trustworthy.
Retrieval doesn't need intelligence. The LLM is the intelligence.
Validation
Karpathy independently arrived at the same architecture on April 3, 2026: markdown files, LLM-maintained, no RAG, active linting. Our Grep is All You Need paper was written 6 weeks earlier.
Related
- ●Improvement Cycles — knowledge-growth is a daily cycle
- ●Thin Soul, Fat Skill — same separation principle