Skip to content

Knowledge layer

The evidence base scriptorium’s skills ground in. The skills without this layer are generic LLM prompts; with it, they are grounded in established practice.

Most AI writing tools rely on the LLM’s pretraining to “know” what good scientific writing is. That works inconsistently and unaccountably. Scriptorium takes the opposite approach: skills cite the specific evidence they ground in, so a contributor or reviewer can trace any behavior back to its source.

The cost is real work — building this evidence base took a substantial first-pass effort — but it produces three durable benefits:

  1. Defensible design. When someone asks “why does citation-audit classify claims this way?”, the answer is a paper.
  2. Accountable evolution. When the evidence updates, the skills should update. Knowledge documents declare their last-updated date.
  3. Honest scope. Each knowledge document closes with an Implementation priority section that states whether the finding becomes a skill or stays framing-only context. Findings that shouldn’t become skills are explicit.
knowledge/
├── prior-art/ # similar tools, projects, lineage (6 notes)
├── scientific-writing/ # methodology of good writing (13 notes)
├── peer-review/ # evidence on review processes (9 notes)
├── citations/ # citation practices and pitfalls (4 notes)
├── editing/ # editing methodology (3 notes)
├── grants/ # grant-writing evidence (3 notes)
├── critique-techniques/ # how to find problems systematically (7 notes)
├── reproducibility/ # the crisis context scriptorium responds to (1 note)
├── author-roles/ # career stage, role, and language behavioral evidence (2 notes)
└── conventions/ # the load-bearing conventions skills share (2 notes)

Current size: 50 notes (49 markdown + 1 Quarto), grown from the first-pass batch of ~40. The most recent additions track v0.2 and v0.3 work:

  • knowledge/scientific-writing/corpus-based-stylometry.md — voice-profile design.
  • knowledge/scientific-writing/literature-search-strategies.md — gap-finder.
  • knowledge/critique-techniques/research-gap-detection.md — gap-finder.
  • knowledge/peer-review/venue-selection.md — venue-fit.
  • knowledge/peer-review/predatory-publishing.md — venue-fit refusal.
  • knowledge/peer-review/preprint-landscape.md — venue-fit opt-in mode.
  • knowledge/author-roles/author-role-evidence.md — persona / voice work.
  • knowledge/conventions/declared-work-scope.md — the project-wide refusal-on-blankness convention.
  • knowledge/conventions/guidance-level.md — the project-wide framing convention.

The two conventions/ notes are project-wide rather than topical: they define the declared-work-scope refusal posture every skill inherits, and the guidance-level framing every conversation-bearing skill respects. Each is reviewed user-side in its concept page (docs/src/content/docs/concepts/declared-work-scope/ and docs/src/content/docs/concepts/guidance-level/ — TODO link once concept pages land); the knowledge notes here are the underlying evidence record for those conventions.

Each subdirectory has a consistent document structure:

# Topic title
*Last updated: YYYY-MM-DD*
## Synthesis
(1–3 paragraphs — what the evidence shows)
## Evidence and frameworks
(Detailed treatment with citations)
## How this informs scriptorium
(Concrete connections to specific skills + MANUSCRIPT_STATE schema)
## Implementation priority for scriptorium
**Verdict:** Yes (v0.X) | Maybe later | No
**If Yes:** skill name, phase, scope, required data
**If Maybe later:** condition that would flip to Yes
**If No:** why this is useful context anyway
## Open questions / weak evidence
## References
(Numbered citations with DOIs/PMIDs/ISBNs)
  • Real DOIs, PMIDs, ISBNs, arXiv IDs only.
  • Items the research could not verify in-session are marked [TODO verify] rather than fabricated.
  • Where source language matters (Toulmin’s argument model, Gopen & Swan’s reader-expectation principles, Kerr’s HARKing definition, etc.), the text is quoted directly.
  • Each document closes with a numbered reference list.

Every skill’s README.md includes a ## Grounding section listing the specific knowledge documents the skill draws on. Example for citation-audit:

## Grounding
This skill is grounded in:
- [citation-claim-alignment](/concepts/knowledge/critique-techniques/citation-claim-alignment/) — the operational technique (Greenberg 2009).
- [citation-accuracy-evidence](/concepts/knowledge/citations/citation-accuracy-evidence/) — error prevalence (de Lacey, Pavlovic).
- [citation-overreach-research](/concepts/knowledge/citations/citation-overreach-research/) — spin and primary-vs-review (Boutron, Yavchitz).
- [hallucination-in-llm-citations](/concepts/knowledge/citations/hallucination-in-llm-citations/) — the AI failure mode this skill must NOT introduce.

This keeps the design accountable: a skill that drifts from its grounding either gets updated or gets its grounding extended.

Documents link to one another with [[doc-name]] syntax (Obsidian- compatible). Documents in different subdirectories link freely; the knowledge layer is a graph, not a tree.

The Implementation priority section of every document feeds docs/roadmap.md. Findings that warrant a skill go on the timeline; findings that warrant framing-only treatment land in DESIGN.md or non-goals; findings that warrant “maybe later” get an explicit trigger condition.

The first-pass evidence base (~40 documents) was produced by parallel research agents over a focused session. Each agent was scoped to a topical subdirectory, given strict citation-discipline instructions (real DOIs only; mark unverifiable as [TODO verify]), and required to close each document with the Implementation priority annotation. The agents independently identified several non-obvious findings worth the project’s attention:

  • LLM arithmetic is unreliable for statistical-consistency checks (statistical-inconsistency); scriptorium skills call out to deterministic scripts (Statcheck, GRIM) rather than recompute in-band.
  • BERTScore’s antonymy problem (semantic-preservation) means embedding similarity is not a safe guard against meaning flips during transformation.
  • NIH’s 2025 Simplified Review Framework bundles Significance and Innovation into a single factor (significance-positioning), changing what a specific-aims skill must accomplish.
  • The 30.85% human–AI comment overlap from Liang et al. 2024 (ai-peer-review-research) is the gold-standard benchmark reviewer-simulation will be evaluated against.

New knowledge documents are welcome. The bar:

  • Real citations.
  • Implementation priority annotation that’s defensible (not aspirational).
  • Cross-links to related documents.
  • Honest acknowledgment of weak-evidence areas and debates.

See CONTRIBUTING.md for the workflow.