Skip to content

Scriptorium

Single-responsibility AI skills that work on prose the author has written. Each skill reads a shared editorial state file and is grounded in a curated, peer-reviewed evidence base.

For authors revising their own manuscripts and grants who want structured, anchored critique on prose they wrote themselves — not blank-slate generation.

What you give scriptorium

## Discussion (excerpt)
Our findings extend the existing literature
on bariatric-surgery outcomes [@thompson2021].
The single most important predictor was
diabetes duration, consistent with the
hypothesis that β-cell reserve declines
monotonically with disease duration.

What citation-audit emits

| Claim | Citation | Strength | Note |
|---------------|----------------|----------|-----------------|
| extend lit | [@thompson2021]| Weak fit | review, not |
| | | | primary cohort |
| β-cell reserve| (none) | Unsupp. | add primary |
| | | | citation; relax |
| | | | "monotonically" |

Structured findings, one per claim, anchored to the source sentence. Author owns every edit. No citations invented.

The skill modifies nothing. It surfaces what it sees. A full worked end-to-end run on this paragraph — citation-audit, reviewer-simulation, argumentative-flow before-and-after — lives at Case study.

/plugin marketplace add seandavi/scriptorium
/plugin install scriptorium@scriptorium
/scriptorium:tour

For the Python CLI (scriptorium init, scriptorium validate, scriptorium trace), see Install.


Where scriptorium fits in the writing workflow

Section titled “Where scriptorium fits in the writing workflow”

The seven shipped lifecycle stages in MANUSCRIPT_STATE.yaml (outlinedraftreviewrevisionsubmissionpost-submissionaccepted) map onto four roles in Hayes’ 2012 cognitive-process model: proposer, translator, evaluator, transcriber. Scriptorium occupies the translator and evaluator roles when the author has already proposed. It does not propose for the author.

Outline · drafting

The author owns hypothesis selection, framing, and the first prose pass.

Not what scriptorium is for. See scope.

Revise · pressure-test

The author has draft prose and wants it tightened. Most scriptorium skills land here.

citation-audit, argumentative-flow, gap-finder, reviewer-simulation, terminology-normalization, figure-text-alignment

Pre-submission

Final compliance, venue fit, reviewer-anticipation checks.

desk-rejection-risk, venue-fit, reporting-guideline-fit, reporting-guideline-compliance, author-contribution-audit, compression

Skills do not auto-invoke. The author chooses which one to run and when. The skill reference lists every shipped skill with its category and the phases it expects.


Panel 1 · what each skill does

Structured, anchored findings

If you want concrete output you can act on — not a quality score.

Every critique skill emits a fixed-shape structured report. reviewer-simulation produces four lenses (methodological skeptic, domain expert, translational/clinical, statistical), each with Major Critiques / Minor Critiques / Fatal Concerns / Enthusiasm Drivers / Suggested Revisions / Acceptance Risk. Each finding is anchored to a specific sentence, paragraph, or section. No letter grade. No overall pass/fail.

If you’ve worried “AI critique” means crushing framing attacks: scriptorium’s reviewer-simulation is grounded in common-critiques-taxonomy and produces fixable bench/stats tasks like “report the 95% CI for the AUC” or “Methods §2.3 omits the missing-data handling.” You set meta.guidance_level to terse, standard, or full; the structure of the output does not change, but the framing around each elicited field does.

See a worked reviewer-simulation run →

### Statistical lens
Major critiques.
- AUC=0.79 reported without
95% CI. Add the CI.
- Internal validation missing
(bootstrap optimism / k-fold).
EPV is in range; the
calibration step is not.
Minor critiques.
- Diabetes duration p<0.001
reported without effect size.
Add OR with 95% CI.
Fatal concerns. None.
Acceptance risk. Moderate.

Panel 2 · what gets preserved

A preservation contract, not a voice guarantee

If “AI flattens my prose” is your worry — read this section closely.

Critique skills (citation-audit, reviewer-simulation, gap-finder, desk-rejection-risk) do not rewrite prose at all. They emit structured reports. Your manuscript file is read, not modified.

Transformation skills (argumentative-flow, compression) operate under an explicit preservation contract:

  • Citations. Every key in the source is preserved; none are added.
  • Statistics. Every numerical value, p-value, CI, and unit is preserved verbatim.
  • Declared terminology. Whatever you put in terminology.preferred / forbidden / synonyms is respected.
  • Hedging style and stance markers. The skill is required to report any shift in hedging strength as a per-edit note rather than absorbing it silently.
  • Declared core claims. No claim is added; no declared claim is dropped.

The honest caveat. Sentence-level “voice preservation” is not a guarantee scriptorium makes. The lexical-fingerprint evidence (Kobak et al. 2024) shows that LLM-edited prose is detectable at corpus scale; correcting it at sentence scale is not reliable. The preservation contract above is what scriptorium does guarantee. A general “this will sound like you” promise is what scriptorium does not. See ai-writing-failure-modes for the underlying evidence.

Why scriptorium positions this way →

MANUSCRIPT_STATE.yaml
terminology:
preferred:
- "tumor-infiltrating
lymphocytes (TILs)"
- "anti-PD-1"
forbidden:
- "groundbreaking"
- "novel"
constraints:
preserve_citations: true
preserve_statistics: true
avoid_hype: true
style:
voice: active
tone:
- quantitative
- restrained

Panel 3 · what the state file does

One file, every skill reads it

If you want to control what skills know about your work.

MANUSCRIPT_STATE.yaml lives at the root of your manuscript repository. Every scriptorium skill reads it. It records what the manuscript is arguing (core_claims), the limitations you have already chosen to acknowledge (known_weaknesses; see naming note), preferred / forbidden terminology, declared style and audience, and constraint flags.

File scope. MANUSCRIPT_STATE.yaml lives in your repository. It is not uploaded to scriptorium and is not shared with anyone unless you commit it to a shared branch. Whether to commit it (private fork; private git remote; .gitignore) is your call. The reviewer-simulation skill reads known_weaknesses so it can refrain from flagging items you have already acknowledged — the field is a calibration input, not a disclosure target.

Schema reference →  ·  Worked example →

project:
title: "..."
target_venue: "Nature Cancer"
document_phase:
current: revision
core_claims:
- "Shorter diabetes duration
and lower pre-op HbA1c
predict remission..."
known_weaknesses:
- "Single-center retrospective
design."
- "No external validation
cohort."
meta:
guidance_level: standard

Panel 4 · where it grounds

Curated, peer-reviewed evidence

If you want to know what every design choice is based on.

Every skill cites the knowledge notes its design decisions come from. Fifty-plus notes cover citation accuracy, hallucination evidence, reporting-guideline density (CONSORT, STROBE, PRISMA, ARRIVE, TRIPOD+AI, CONSORT-AI / SPIRIT-AI), reviewer-archetype research, plain-language summaries, desk-rejection rates, and the AI-writing failure-mode literature. The default field scope is biomedical/clinical — that is where the reporting-guideline density is highest and where the project’s contributors work. Extensions to other fields are welcome but not yet evidenced (PRs invited).

Browse the knowledge layer →  ·  All shipped skills →

skill: reviewer-simulation
grounding:
- reviewer-archetypes-evidence
- common-critiques-taxonomy
- ai-peer-review-research
- critique-quality-evidence
skill: citation-audit
grounding:
- citation-claim-alignment
- citation-accuracy-evidence
- citation-overreach-research
- hallucination-in-llm-citations

A pre-submission tool earns trust by being explicit about its limits. These are not aspirations — they are the explicit non-goals in the roadmap, each pointing at the evidence note that justifies it.

Out of scope by design

  • No blank-slate prose generation. Scriptorium does not draft sections from nothing. The proposer role (Hayes 2012) is yours. declared-work-scope.
  • No autonomous reviewing. reviewer-simulation is author-side only. Editorial-side use violates ICMJE, NIH, and major-publisher policy.
  • No overall quality score. No letter grade, no Flesch-Kincaid, no “writing quality” number. Those systematically misrate scientific prose.
  • No reference-manager replacement. Citation audit works with the bibliography Zotero / Mendeley / BibTeX already produced. reference-managers.

Out of scope at this stage

  • No image forensics. Bik-style figure-integrity review and tortured-phrase detection require domain experts. Scriptorium is a pre-submission first pass, not a sleuth.
  • No discipline defaults outside biomedical / clinical (v0.1–v0.3). The evidence base is biomedical-coded. Physics, CS/ML, mathematics, humanities defaults need per-discipline knowledge layers that don’t exist yet.
  • No native .docx round-trip. Scriptorium operates on markdown-converted prose. Tracked changes and field codes are not preserved through the conversion.
  • No skill-degradation defence. Whether extended use atrophies the author’s underlying skill is plausible but under-evidenced. Author retains responsibility for their writing.

Skills also surface their own limits in the report itself — each output includes a “What this skill did NOT check” section so the limit is in front of the author at decision time, not buried in documentation.


I'm new — show me what this is

First-time tour — install the plugin, run /scriptorium:tour, get a three-or-four-turn walk-through that ends with one concrete next move.

I want to see real output

Case study walks a realistic discussion paragraph through citation-audit, reviewer-simulation, and argumentative-flow — before, the structured output, and after.

Which skill do I want?

Skills reference — every shipped skill, by category, with one-line descriptions and grounding pointers. Includes the lifecycle stage each skill expects.

Why is it built this way?

Design collects the architecture decisions and the failure-mode literature each defensive choice is grounded in — hallucinated citations, lexical homogenisation, automation complacency, suggestion-acceptance bias.


Source: github.com/seandavi/scriptorium. Code: MIT. Documentation and knowledge layer: CC BY 4.0. Maintained by Sean Davis.