Convert your manuscript to markdown
Scriptorium works best on markdown-flavored manuscript text. This guide covers the common source formats. Pick the section that matches yours.
Already markdown (Quarto, plain .md, Pandoc Markdown)
Section titled “Already markdown (Quarto, plain .md, Pandoc Markdown)”No conversion needed. Skip to Install and start running skills.
Microsoft Word (.docx)
Section titled “Microsoft Word (.docx)”Recommended: pandoc.
pandoc manuscript.docx -o manuscript.md --wrap=preserve --markdown-headings=atxAlternative: mammoth (better for Word-styled documents).
mammoth manuscript.docx --output-format=markdown > manuscript.mdWhat you lose: tracked changes, comments, complex tables, embedded images-as-objects. Re-resolve manually if relevant.
LaTeX (.tex)
Section titled “LaTeX (.tex)”pandoc manuscript.tex -o manuscript.md \ --bibliography references.bib \ --citeproc \ --wrap=preserveWhat you lose: custom macros; complex math environments survive but rarely render in markdown viewers; TikZ figures need separate export.
Google Docs
Section titled “Google Docs”File → Download → Markdown (.md) — built-in since 2024. Or download as Word and use the .docx instructions above.
Overleaf / shared LaTeX
Section titled “Overleaf / shared LaTeX”Use the LaTeX instructions above on the project’s main .tex.
PDF (last resort)
Section titled “PDF (last resort)”Quality varies. Try:
pdftotext -layout manuscript.pdf manuscript.txtOr marker / nougat for academic PDF OCR. Expect manual cleanup. Citations, figure refs, and table structure usually need re-resolution.
After conversion
Section titled “After conversion”- Validate that citations and figure references survived (a quick diff against the source helps).
- Populate
MANUSCRIPT_STATE.yamlwithproject.source_format:set to the original format (docx-via-pandoc,latex,gdocs-export, etc.) — this is a hint for skills that may apply format-specific parsing in v0.2+. - Run
scriptorium validate <state-file>before running any skills.
Related
Section titled “Related”- Design — manuscript format scope — why scriptorium leans on markdown.
- GitHub issue #25 — the canonical tracking issue.