Scriptorium

Single-responsibility AI skills that work on prose the author has written. Each skill reads a shared editorial state file and is grounded in a curated, peer-reviewed evidence base.

First-time tour View on GitHub

For authors revising their own manuscripts and grants who want structured, anchored critique on prose they wrote themselves — not blank-slate generation.

What you give scriptorium

## Discussion (excerpt)

Our findings extend the existing literature
on bariatric-surgery outcomes [@thompson2021].
The single most important predictor was
diabetes duration, consistent with the
hypothesis that β-cell reserve declines
monotonically with disease duration.

What citation-audit emits

| Claim         | Citation       | Strength | Note            |
|---------------|----------------|----------|-----------------|
| extend lit    | [@thompson2021]| Weak fit | review, not     |
|               |                |          | primary cohort  |
| β-cell reserve| (none)         | Unsupp.  | add primary     |
|               |                |          | citation; relax |
|               |                |          | "monotonically" |

Structured findings, one per claim, anchored to the source sentence. Author owns every edit. No citations invented.

The skill modifies nothing. It surfaces what it sees. A full worked end-to-end run on this paragraph — citation-audit, reviewer-simulation, argumentative-flow before-and-after — lives at Case study.

/plugin marketplace add seandavi/scriptorium
/plugin install scriptorium@scriptorium
/scriptorium:tour

For the Python CLI (scriptorium init, scriptorium validate, scriptorium trace), see Install.

Where scriptorium fits in the writing workflow

The seven shipped lifecycle stages in MANUSCRIPT_STATE.yaml (outline → draft → review → revision → submission → post-submission → accepted) map onto four roles in Hayes’ 2012 cognitive-process model: proposer, translator, evaluator, transcriber. Scriptorium occupies the translator and evaluator roles when the author has already proposed. It does not propose for the author.

Outline · drafting

The author owns hypothesis selection, framing, and the first prose pass.

Not what scriptorium is for. See scope.

Revise · pressure-test

The author has draft prose and wants it tightened. Most scriptorium skills land here.

citation-audit, argumentative-flow, gap-finder, reviewer-simulation, terminology-normalization, figure-text-alignment

Pre-submission

Final compliance, venue fit, reviewer-anticipation checks.

desk-rejection-risk, venue-fit, reporting-guideline-fit, reporting-guideline-compliance, author-contribution-audit, compression

Skills do not auto-invoke. The author chooses which one to run and when. The skill reference lists every shipped skill with its category and the phases it expects.

Four panels, one editorial layer

Panel 1 · what each skill does

Structured, anchored findings

If you want concrete output you can act on — not a quality score.

Every critique skill emits a fixed-shape structured report. reviewer-simulation produces four lenses (methodological skeptic, domain expert, translational/clinical, statistical), each with Major Critiques / Minor Critiques / Fatal Concerns / Enthusiasm Drivers / Suggested Revisions / Acceptance Risk. Each finding is anchored to a specific sentence, paragraph, or section. No letter grade. No overall pass/fail.

If you’ve worried “AI critique” means crushing framing attacks: scriptorium’s reviewer-simulation is grounded in common-critiques-taxonomy and produces fixable bench/stats tasks like “report the 95% CI for the AUC” or “Methods §2.3 omits the missing-data handling.” You set meta.guidance_level to terse, standard, or full; the structure of the output does not change, but the framing around each elicited field does.

See a worked reviewer-simulation run →

### Statistical lens

Major critiques.
- AUC=0.79 reported without
  95% CI. Add the CI.
- Internal validation missing
  (bootstrap optimism / k-fold).
  EPV is in range; the
  calibration step is not.

Minor critiques.
- Diabetes duration p<0.001
  reported without effect size.
  Add OR with 95% CI.

Fatal concerns. None.
Acceptance risk. Moderate.

Panel 2 · what gets preserved

A preservation contract, not a voice guarantee

If “AI flattens my prose” is your worry — read this section closely.

Critique skills (citation-audit, reviewer-simulation, gap-finder, desk-rejection-risk) do not rewrite prose at all. They emit structured reports. Your manuscript file is read, not modified.

Transformation skills (argumentative-flow, compression) operate under an explicit preservation contract:

Citations. Every key in the source is preserved; none are added.
Statistics. Every numerical value, p-value, CI, and unit is preserved verbatim.
Declared terminology. Whatever you put in terminology.preferred / forbidden / synonyms is respected.
Hedging style and stance markers. The skill is required to report any shift in hedging strength as a per-edit note rather than absorbing it silently.
Declared core claims. No claim is added; no declared claim is dropped.

The honest caveat. Sentence-level “voice preservation” is not a guarantee scriptorium makes. The lexical-fingerprint evidence (Kobak et al. 2024) shows that LLM-edited prose is detectable at corpus scale; correcting it at sentence scale is not reliable. The preservation contract above is what scriptorium does guarantee. A general “this will sound like you” promise is what scriptorium does not. See ai-writing-failure-modes for the underlying evidence.

Why scriptorium positions this way →

terminology:
  preferred:
    - "tumor-infiltrating
       lymphocytes (TILs)"
    - "anti-PD-1"
  forbidden:
    - "groundbreaking"
    - "novel"

constraints:
  preserve_citations: true
  preserve_statistics: true
  avoid_hype: true

style:
  voice: active
  tone:
    - quantitative
    - restrained

Panel 3 · what the state file does

One file, every skill reads it

If you want to control what skills know about your work.

MANUSCRIPT_STATE.yaml lives at the root of your manuscript repository. Every scriptorium skill reads it. It records what the manuscript is arguing (core_claims), the limitations you have already chosen to acknowledge (known_weaknesses; see naming note), preferred / forbidden terminology, declared style and audience, and constraint flags.

File scope. MANUSCRIPT_STATE.yaml lives in your repository. It is not uploaded to scriptorium and is not shared with anyone unless you commit it to a shared branch. Whether to commit it (private fork; private git remote; .gitignore) is your call. The reviewer-simulation skill reads known_weaknesses so it can refrain from flagging items you have already acknowledged — the field is a calibration input, not a disclosure target.

Schema reference → · Worked example →

project:
  title: "..."
  target_venue: "Nature Cancer"

document_phase:
  current: revision

core_claims:
  - "Shorter diabetes duration
     and lower pre-op HbA1c
     predict remission..."

known_weaknesses:
  - "Single-center retrospective
     design."
  - "No external validation
     cohort."

meta:
  guidance_level: standard

Panel 4 · where it grounds

Curated, peer-reviewed evidence

If you want to know what every design choice is based on.

Every skill cites the knowledge notes its design decisions come from. Fifty-plus notes cover citation accuracy, hallucination evidence, reporting-guideline density (CONSORT, STROBE, PRISMA, ARRIVE, TRIPOD+AI, CONSORT-AI / SPIRIT-AI), reviewer-archetype research, plain-language summaries, desk-rejection rates, and the AI-writing failure-mode literature. The default field scope is biomedical/clinical — that is where the reporting-guideline density is highest and where the project’s contributors work. Extensions to other fields are welcome but not yet evidenced (PRs invited).

Browse the knowledge layer → · All shipped skills →

skill: reviewer-simulation
grounding:
  - reviewer-archetypes-evidence
  - common-critiques-taxonomy
  - ai-peer-review-research
  - critique-quality-evidence

skill: citation-audit
grounding:
  - citation-claim-alignment
  - citation-accuracy-evidence
  - citation-overreach-research
  - hallucination-in-llm-citations

What scriptorium will not do

A pre-submission tool earns trust by being explicit about its limits. These are not aspirations — they are the explicit non-goals in the roadmap, each pointing at the evidence note that justifies it.

Out of scope by design

No blank-slate prose generation. Scriptorium does not draft sections from nothing. The proposer role (Hayes 2012) is yours. declared-work-scope.
No autonomous reviewing. reviewer-simulation is author-side only. Editorial-side use violates ICMJE, NIH, and major-publisher policy.
No overall quality score. No letter grade, no Flesch-Kincaid, no “writing quality” number. Those systematically misrate scientific prose.
No reference-manager replacement. Citation audit works with the bibliography Zotero / Mendeley / BibTeX already produced. reference-managers.

Out of scope at this stage

No image forensics. Bik-style figure-integrity review and tortured-phrase detection require domain experts. Scriptorium is a pre-submission first pass, not a sleuth.
No discipline defaults outside biomedical / clinical (v0.1–v0.3). The evidence base is biomedical-coded. Physics, CS/ML, mathematics, humanities defaults need per-discipline knowledge layers that don’t exist yet.
No native .docx round-trip. Scriptorium operates on markdown-converted prose. Tracked changes and field codes are not preserved through the conversion.
No skill-degradation defence. Whether extended use atrophies the author’s underlying skill is plausible but under-evidenced. Author retains responsibility for their writing.

Skills also surface their own limits in the report itself — each output includes a “What this skill did NOT check” section so the limit is in front of the author at decision time, not buried in documentation.

Where to go next

I'm new — show me what this is

First-time tour — install the plugin, run /scriptorium:tour, get a three-or-four-turn walk-through that ends with one concrete next move.

I want to see real output

Case study walks a realistic discussion paragraph through citation-audit, reviewer-simulation, and argumentative-flow — before, the structured output, and after.

Which skill do I want?

Skills reference — every shipped skill, by category, with one-line descriptions and grounding pointers. Includes the lifecycle stage each skill expects.

Why is it built this way?

Design collects the architecture decisions and the failure-mode literature each defensive choice is grounded in — hallucinated citations, lexical homogenisation, automation complacency, suggestion-acceptance bias.

Source: github.com/seandavi/scriptorium. Code: MIT. Documentation and knowledge layer: CC BY 4.0. Maintained by Sean Davis.