Knowledge layer
The evidence base scriptorium’s skills ground in. The skills without this layer are generic LLM prompts; with it, they are grounded in established practice.
Why this layer exists
Section titled “Why this layer exists”Most AI writing tools rely on the LLM’s pretraining to “know” what good scientific writing is. That works inconsistently and unaccountably. Scriptorium takes the opposite approach: skills cite the specific evidence they ground in, so a contributor or reviewer can trace any behavior back to its source.
The cost is real work — building this evidence base took a substantial first-pass effort — but it produces three durable benefits:
- Defensible design. When someone asks “why does
citation-auditclassify claims this way?”, the answer is a paper. - Accountable evolution. When the evidence updates, the skills should update. Knowledge documents declare their last-updated date.
- Honest scope. Each knowledge document closes with an Implementation priority section that states whether the finding becomes a skill or stays framing-only context. Findings that shouldn’t become skills are explicit.
Layout
Section titled “Layout”knowledge/├── prior-art/ # similar tools, projects, lineage (6 notes)├── scientific-writing/ # methodology of good writing (13 notes)├── peer-review/ # evidence on review processes (9 notes)├── citations/ # citation practices and pitfalls (4 notes)├── editing/ # editing methodology (3 notes)├── grants/ # grant-writing evidence (3 notes)├── critique-techniques/ # how to find problems systematically (7 notes)├── reproducibility/ # the crisis context scriptorium responds to (1 note)├── author-roles/ # career stage, role, and language behavioral evidence (2 notes)└── conventions/ # the load-bearing conventions skills share (2 notes)Current size: 50 notes (49 markdown + 1 Quarto), grown from the first-pass batch of ~40. The most recent additions track v0.2 and v0.3 work:
knowledge/scientific-writing/corpus-based-stylometry.md— voice-profile design.knowledge/scientific-writing/literature-search-strategies.md— gap-finder.knowledge/critique-techniques/research-gap-detection.md— gap-finder.knowledge/peer-review/venue-selection.md— venue-fit.knowledge/peer-review/predatory-publishing.md— venue-fit refusal.knowledge/peer-review/preprint-landscape.md— venue-fit opt-in mode.knowledge/author-roles/author-role-evidence.md— persona / voice work.knowledge/conventions/declared-work-scope.md— the project-wide refusal-on-blankness convention.knowledge/conventions/guidance-level.md— the project-wide framing convention.
The two conventions/ notes are project-wide rather than topical:
they define the declared-work-scope
refusal posture every skill inherits, and the
guidance-level framing every
conversation-bearing skill respects. Each is reviewed user-side in
its concept page (docs/src/content/docs/concepts/declared-work-scope/
and docs/src/content/docs/concepts/guidance-level/ — TODO link once
concept pages land); the knowledge notes here are the underlying
evidence record for those conventions.
Each subdirectory has a consistent document structure:
# Topic title
*Last updated: YYYY-MM-DD*
## Synthesis(1–3 paragraphs — what the evidence shows)
## Evidence and frameworks(Detailed treatment with citations)
## How this informs scriptorium(Concrete connections to specific skills + MANUSCRIPT_STATE schema)
## Implementation priority for scriptorium**Verdict:** Yes (v0.X) | Maybe later | No**If Yes:** skill name, phase, scope, required data**If Maybe later:** condition that would flip to Yes**If No:** why this is useful context anyway
## Open questions / weak evidence
## References(Numbered citations with DOIs/PMIDs/ISBNs)Citation discipline
Section titled “Citation discipline”- Real DOIs, PMIDs, ISBNs, arXiv IDs only.
- Items the research could not verify in-session are marked
[TODO verify]rather than fabricated. - Where source language matters (Toulmin’s argument model, Gopen & Swan’s reader-expectation principles, Kerr’s HARKing definition, etc.), the text is quoted directly.
- Each document closes with a numbered reference list.
How skills reference knowledge
Section titled “How skills reference knowledge”Every skill’s README.md includes a ## Grounding section listing the
specific knowledge documents the skill draws on. Example for
citation-audit:
## Grounding
This skill is grounded in:- [citation-claim-alignment](/concepts/knowledge/critique-techniques/citation-claim-alignment/) — the operational technique (Greenberg 2009).- [citation-accuracy-evidence](/concepts/knowledge/citations/citation-accuracy-evidence/) — error prevalence (de Lacey, Pavlovic).- [citation-overreach-research](/concepts/knowledge/citations/citation-overreach-research/) — spin and primary-vs-review (Boutron, Yavchitz).- [hallucination-in-llm-citations](/concepts/knowledge/citations/hallucination-in-llm-citations/) — the AI failure mode this skill must NOT introduce.This keeps the design accountable: a skill that drifts from its grounding either gets updated or gets its grounding extended.
Cross-linking
Section titled “Cross-linking”Documents link to one another with [[doc-name]] syntax (Obsidian-
compatible). Documents in different subdirectories link freely; the
knowledge layer is a graph, not a tree.
The roadmap connection
Section titled “The roadmap connection”The Implementation priority section of every document feeds
docs/roadmap.md. Findings that warrant a
skill go on the timeline; findings that warrant framing-only
treatment land in DESIGN.md or non-goals; findings that warrant
“maybe later” get an explicit trigger condition.
How this was built
Section titled “How this was built”The first-pass evidence base (~40 documents) was produced by parallel
research agents over a focused session. Each agent was scoped to a
topical subdirectory, given strict citation-discipline instructions
(real DOIs only; mark unverifiable as [TODO verify]), and required
to close each document with the Implementation priority annotation.
The agents independently identified several non-obvious findings worth
the project’s attention:
- LLM arithmetic is unreliable for statistical-consistency checks
(
statistical-inconsistency); scriptorium skills call out to deterministic scripts (Statcheck, GRIM) rather than recompute in-band. - BERTScore’s antonymy problem
(
semantic-preservation) means embedding similarity is not a safe guard against meaning flips during transformation. - NIH’s 2025 Simplified Review Framework bundles Significance and
Innovation into a single factor
(
significance-positioning), changing what aspecific-aimsskill must accomplish. - The 30.85% human–AI comment overlap from Liang et al. 2024
(
ai-peer-review-research) is the gold-standard benchmarkreviewer-simulationwill be evaluated against.
Contributing knowledge
Section titled “Contributing knowledge”New knowledge documents are welcome. The bar:
- Real citations.
- Implementation priority annotation that’s defensible (not aspirational).
- Cross-links to related documents.
- Honest acknowledgment of weak-evidence areas and debates.
See CONTRIBUTING.md for the workflow.