The CLI: quartobot
A Python CLI for pre-render and out-of-render work. quartobot resolve runs as a Quarto pre-render hook and calls manubot.cite directly to populate the bibliography before pandoc starts. scan, validate, init, and use round out the surface for CI-lint and scaffolding.
uv tool install git+https://github.com/seandavi/quartobotquartobot depends on manubot as a Python library. See Install for uvx, editable, and post-v0.1-tag pip install paths.
Pre-render commands
quartobot scan
Walks .qmd, .md, .Rmd, and .ipynb files under a path, extracts every cite key, classifies each one (manubot prefix, bare DOI, or hand-curated), groups the results, and reports repetition counts and cross-file duplicates with file:line locations. Pure read. No network. Pure reporter, too — scan always exits 0 once it finishes; gating lives in validate.
$ quartobot scan .
arxiv:
2104.10729 (2x)
doi:
10.1038/s41586-024-12345
10.1371/journal.pcbi.1007128 (3x)
pmid:
31479462
(hand-curated):
quarto2024
5 unique key(s), 7 total occurrence(s) across 3 file(s).
Duplicates:
@doi:10.1371/journal.pcbi.1007128:
intro.qmd:14
methods.qmd:42
notebook.ipynb:cell3:9
Prefixes are listed alphabetically; hand-curated keys appear last.
The scan is heuristic — it strips YAML/TOML frontmatter, fenced code blocks (``` / ~~~), and inline code spans before searching, so decoys like @fake:notacite inside backticks won’t surface. For .ipynb files, only markdown cells are scanned; cell index appears alongside line number (paper.ipynb:cell3:9). The authoritative parse happens at render time inside pandoc citeproc.
Pass --no-recursive to scan only files directly under the given path. Render outputs and tool caches (_site/, _book/, _freeze/, .quarto/, .git/, .ipynb_checkpoints/, etc.) are skipped at any depth.
Exit codes:
0— scan completed. Always. Repeated keys show up in the listing ((Nx)next to the identifier, plus a “Duplicates:” section when a key crosses file boundaries) but they don’t gate the exit.2— bad arguments.
Wire validate into pre-commit / CI when you want a gate.
quartobot resolve
Pre-fetch persistent-identifier citations via manubot.cite and write the resulting CSL JSON to disk. Designed to run as a Quarto pre-render hook declared in _quarto.yml:
project:
pre-render: quartobot resolve --from-scan . --id-mode citation-key$ quartobot resolve --from-scan . --id-mode citation-key
✓ doi:10.1371/journal.pcbi.1007128 → YuJbg3zO
✓ pmid:31479462 → r3UbYxrJ
✓ arxiv:2104.10729 → OCxCvqZo (cached)
3 resolved (1 from cache). Wrote 3 entries to references.resolved.bib.
Pass keys as arguments (quartobot resolve doi:10.x/y pmid:12345) or use --from-scan PATH to resolve every persistent-identifier key in a project. Hand-curated keys (no recognized prefix) are skipped — those live in references.bib and pandoc citeproc handles them.
--id-mode citation-key writes the CSL id field as the user’s prose key (doi:10.1371/...) so pandoc-citeproc matches [@doi:...] in the source directly. Without it, manubot’s canonical short hash (YuJbg3zO) goes in id and pandoc-citeproc silently fails to match prose keys. The pre-render hook architecture depends on this flag.
resolve writes two artifacts by default: references.resolved.bib (BibLaTeX, the file pandoc reads at render — list this under bibliography: in _quarto.yml) and references.json (CSL JSON cache, gitignored, read on the next resolve for cache hits). Override either with --bib-output PATH or --output PATH. The --cache option defaults to --output, so re-runs are idempotent. --dry-run reports what would be resolved without making any network calls.
Pass --output - to stream the CSL JSON to stdout instead of a file — the one-shot lookup shape for shell-tool agents and scripts that pipe through jq. Stdout mode skips the BibLaTeX write:
$ quartobot resolve --output - doi:10.1371/journal.pcbi.1007128 | jq '.[0].title'
"Open collaborative writing with Manubot"
In stdout mode the summary line goes to stderr and no file writes happen. Cache reads still work when --cache <path> is set explicitly.
Exit codes:
0— every key resolved (cache hits count as success).1— one or more keys failed (network error, Crossref 404, etc.).2— bad arguments.
quartobot validate
Pre-flight / CI-lint surface. Static config checks against a Quarto project — no network. Run this in CI to catch the most common foot-guns before they reach a render.
$ quartobot validate .
✓ _quarto.yml exists
✓ bibliography declared — 2 file(s): references.bib, references.resolved.bib
✗ pre-render hook — `quartobot resolve` is invoked but `--id-mode citation-key` is missing. Without it, CSL `id`s are manubot's short hashes (`YuJbg3zO`), not the prose keys (`doi:10.1371/...`), and pandoc-citeproc silently fails to match any cites.
✓ resolved bibliography in `bibliography:` — `references.resolved.bib` listed in `bibliography:`
✓ no duplicate cite keys — 5 unique key(s) in 3 file(s)
1 of 5 check(s) failed. Exit 1.
Checks run:
_quarto.ymlexists and parses as YAML.bibliography:is declared (as a string or list).project.pre-rendercallsquartobot resolvewith--id-mode citation-key. The flag is load-bearing — without it, manubot’s canonical short hashes replace the user’s prose keys and pandoc-citeproc silently fails to match anything.references.resolved.bibappears inbibliography:— the most common silent failure under the pre-render hook architecture, since without it pandoc citeproc never reads whatquartobot resolvewrote. The legacy v0.3references.jsonis accepted with a migration hint.- No cite key appears in more than one file. Same-key-twice in the same file is the normal academic-writing case (one source, several claims) and is not flagged. The check is intentionally narrow: cross-file duplication is the case the chunked-content pattern can produce by accident; same-file repetition is intent.
Citation-resolution checks (“does this DOI actually resolve at Crossref?”) are out of scope here — they need network. Run quartobot resolve --dry-run --from-scan . separately for that.
Exit codes: 0 if every check passes, 1 on any failure.
Scaffolding commands
quartobot init
Scaffold the citation pipeline into an existing (or empty) Quarto project:
$ quartobot init
Project type: manuscript
+ _quarto.yml [written]
+ references.bib [written]
~ .gitignore [appended] — added 7 line(s)
Next steps:
1. Confirm `quartobot` is on PATH: `quartobot --version`
(install with `uv tool install git+https://github.com/seandavi/quartobot`)
2. Add citations to your prose: @doi:..., @pmid:..., etc.
3. quarto render
To add the version banner + GitHub Actions CI, run `quartobot use github-ci` after this.
init writes only what the citation pipeline needs: _quarto.yml wired with the quartobot resolve pre-render hook and a bibliography: list, a seed references.bib, and a .gitignore augment so references.resolved.bib (regenerated each render) stays out of the repo. Three files, nothing else.
Conservative — never overwrites existing files. If _quarto.yml already exists, prints a YAML snippet to merge in manually instead of touching it. .gitignore is the one file modified in place (idempotent, appends only).
--project-type {auto,manuscript,book} controls what gets written; auto detects from _quarto.yml, falling back to manuscript.
quartobot use github-ci
Scaffold the GitHub Actions render workflow + PR-preview cleanup — the manuscript-as-software CI machinery that used to ride along with init. Opt-in, idempotent, scoped to one job.
By default it scaffolds the lean pipeline: latest deploy at /, PR preview at /pr/<n>/, generated /versions/ page, sticky PR comment. No per-commit permalinks, no banner, no snapshot retention.
$ quartobot use github-ci
Project type: manuscript
Pipeline: lean
+ .github/workflows/render.yml [written]
+ .github/workflows/pr-closed.yml [written]
Next steps:
1. Commit the new files and push to GitHub.
2. The render workflow fires on push to main and on PRs.
3. After the first push, the manuscript lands at `/`,
the `/versions/` page lists tagged releases and open
PR previews, and PRs get a sticky comment with links.
For the v0.1 manubot-pattern pipeline (per-commit /v/<sha>/ permalinks + snapshot retention + HTML version banner), pass --with-versioned-snapshots. That mode also writes _version-banner.html.template + _version-banner.html and prints a snippet for the _quarto.yml banner include.
The scaffolded render.yml is a thin caller of one of the upstream reusable workflows. For the full input list, the composite-action references, and the standalone-composition pattern, see Workflows and actions.
Re-running is safe: files already on disk are left alone and report as skipped-exists. When _quarto.yml already declares the banner include (versioned-snapshots mode), the manual-merge snippet is suppressed.
use is a click group, designed to grow. github-ci is the first inhabitant; future siblings (use jupyter-notebooks, use pre-commit, use mcp, use joss-paper) are scoped but not yet shipped. The naming convention follows R’s usethis package: one verb (use), one role per subcommand.
Philosophy
The CLI calls manubot.cite (the resolver library) directly from a Quarto pre-render hook and lets pandoc citeproc (the renderer) consume the resulting CSL JSON. Every command is either pre-render (do work ahead of quarto render so the render itself is faster and more reliable) or out-of-render (init, scan, validate — work that doesn’t touch render at all).
Opaque-by-default for the CI surface: a consumer’s .github/workflows/render.yml is a thin caller pointing at the upstream reusable workflow. quartobot detach (when it ships) is the escape hatch when consumers want to fork the pipeline. The opposite of r-lib/actions, which copies 150 lines into every consumer repo. quartobot’s default is friendlier; the escape hatch matches their model for users who want it.