Concepts·live · auto-updated

Knowledge Layers

Grep-based retrieval. No vector database. No embeddings. 100% source fidelity.

Three Layers

Layer 1: Raw Source (grep)

Original texts in markdown. 192 books across 4 domains (spiritual_zh/en, tcm_zh/en). Agents use grep -C 8 to pull 8 lines of context around keyword matches. Retrieval is deterministic, millisecond-fast, 100% faithful to source.

Layer 2: Compiled Concepts + FAQ

For each source book, an LLM compiles two companion files:

●{book}_concepts.md — 5-10 core concepts, key quotes, practice notes (~1KB)
●{book}_faq.md — 5-8 Q&A pairs anticipating reader questions

257KB source → 1KB concepts = 257x compression, core meaning preserved.

Incremental compilation: one file at a time, skips already-compiled files. knowledge-growth runs 3-5 books/day. 192 books finish in ~48 days.

Layer 2.5: Cross-Book Index

When an author has 2+ books compiled, an aggregate step produces _index.md:

●Recurring themes across books
●Unique contributions per book
●Internal contradictions or tensions
●Recommended reading path

Single-book authors skip this step (no cross-comparison possible).

Why Not Vector RAG

Vector databases introduce preprocessing overhead, embedding model dependencies, approximate nearest neighbor errors, and black-box retrieval. For domain-specific knowledge where exact source fidelity matters (medical formulas, scripture quotes), grep is both faster and more trustworthy.

Retrieval doesn't need intelligence. The LLM is the intelligence.

Validation

Karpathy independently arrived at the same architecture on April 3, 2026: markdown files, LLM-maintained, no RAG, active linting. Our Grep is All You Need paper was written 6 weeks earlier.

●Improvement Cycles — knowledge-growth is a daily cycle
●Thin Soul, Fat Skill — same separation principle

← All entries