Skip to content

Citation accuracy: evidence base

2026-05-17

Last updated: 2026-05-17

The empirical literature on citation accuracy converges on an uncomfortable finding: roughly one in four to one in five citations in biomedical and adjacent literatures contains some form of inaccuracy, and roughly one in ten contains an inaccuracy serious enough to mislead a careful reader about what the cited paper says. This is not a “new” problem traceable to LLMs; it predates them by four decades. The earliest systematic accounting was published in BMJ in 1985 (de Lacey et al. 1985), and every replication since — across specialties, decades, and review methodologies — has landed in the same neighborhood.

Two patterns matter most for any system that audits citations. First, errors are not evenly distributed: review articles and frequently cited “canonical” papers attract disproportionate inaccuracy, and a substantial minority of errors propagate as citation chains in which authors copy a wrong claim from a previous author who copied it from theirs. Second, errors split between quotation errors (the cited paper does not support the statement, or supports it only partially) and citation errors (bibliographic data is incomplete or wrong). Quotation errors are the load-bearing problem; bibliographic errors are mostly cosmetic.

Implication for citation-audit: the skill should be designed around the quotation-support question — does the cited work actually substantiate the in-text claim? — rather than around bibliographic metadata, where the marginal value of an automated check is small and existing tools (CrossRef, reference managers) are already strong.

de Lacey et al. (1985) examined references in six medical journals published in January 1984. Quotation accuracy: the original author was misquoted in 15% of all references, and “most of the errors would have misled readers.” Citation accuracy: errors occurred in 24% of references, of which 8% were major (preventing identification of the source). This is the foundational study; subsequent work largely confirms its numbers.

The Wager and Middleton (2008) Cochrane methodology systematic review (66 studies analyzing 3,836 references across 74 biomedical journals) reported a median citation-error rate of 38% (range 4–67%) across more than 27,000 references, and a median “major and minor quotation errors” rate of 20% (range 0–50%). Technical editing was associated with lower error rates.

Jergas and Baethge (2015) — systematic review and meta-analysis of 28 studies (1985–2013) covering 7,321 references. Pooled prevalence: major quotation errors 11.9% (95% CI 8.4–16.6); minor errors 11.5% (95% CI 8.3–15.7); total quotation errors 25.4% (95% CI 19.5–32.4). Their conclusion: “quotation errors are common in medical journal articles,” and “even the lowest estimate of total quotation errors was considerable (6.7%).”

Pavlovic et al. (2021) re-examined frequently cited biomedical papers by going back to the original first authors to verify what their own papers actually said. Feasibility study (1,540 articles, 2,526 citations of 14 papers): 7.2% of individual citations inaccurate; 11.1% of articles contained at least one inaccuracy. Verification study (2,995 articles, 4,912 citations of 13 papers): 10.3% citations inaccurate; 15.0% of articles affected.

Critical sub-findings from Pavlovic et al. (2021):

  • Citation of nonexistent findings was the largest single error category (38.4% of inaccuracies) — the cited paper did not contain the claim attributed to it at all.
  • Inaccurately cited numerical data (16.6%) and inaccurate interpretation (15.4%) were the next two categories.
  • Citation chains accounted for ~20–24% of inaccuracies — errors copied from previous citing articles rather than introduced fresh.
  • Review articles were more likely than primary research articles to contain inaccuracies, and inaccuracy rose with time since the cited paper’s publication.

Specialty-specific evidence — Sauder et al. (2022) on surgical literature found inaccuracy rates broadly consistent with the general biomedical figures, with higher rates in lower-evidence study designs. This pattern (higher inaccuracy where the claim is fuzzier) recurs across specialties.

The evidence base shapes citation-audit in three concrete ways.

  1. Audit the quotation, not just the citation. Bibliographic errors are a long-solved problem in modern reference-manager workflows; quotation errors are not. The skill’s primary responsibility is to ask, for each in-text citation, whether the referenced work supports the specific claim being made. Output sections should make this question explicit (“Claim supported?” “Claim partially supported?” “Claim not located in source?”).

  2. Treat review citations differently from primary citations. The Pavlovic et al. (2021) finding that review articles carry higher inaccuracy rates, combined with the “review-citing-review” cascade documented in [[citation-overreach-research]], means the auditor should flag when a mechanistic or quantitative claim is supported only by a review citation. The flag is not “wrong” — it is “verify with primary source.” This recommendation should never invent the primary source; see [[hallucination-in-llm-citations]].

  3. Surface citation chains. When two or more papers cite the same source for the same claim using nearly identical phrasing, this is a hallmark of an unverified citation chain. The skill can detect such patterns when multiple papers in a corpus are audited together, and warn — without claiming to have verified the underlying source.

In MANUSCRIPT_STATE.yaml, the preserve_citations: true constraint prevents transformation skills (compression, argumentative-flow) from silently dropping or replacing references — which would otherwise introduce the citation errors documented above as a side effect of helpful-looking edits.

  • Most accuracy studies are biomedical. Engineering, physics, and the humanities are less well-characterized. The numbers above should not be extrapolated to those literatures without caveat.
  • “Major” vs “minor” error categorization is judgment-dependent. Jergas and Baethge (2015) acknowledges heterogeneity; pooled estimates carry wide confidence intervals.
  • The fraction of errors that change scientific conclusions (vs. merely misattribute who said what) is not well quantified. Greenberg’s distortion analysis in [[citation-overreach-research]] is the closest existing work and is qualitative.

This document is rendered from citation-accuracy-evidence.qmd via Quarto with a quartobot pre-render step. Cite keys in prose use persistent identifiers directly (@pmid:, @doi:); quartobot resolves them through manubot into references.json before Quarto’s citeproc renders the final citations and the References section. No hand-curated .bib file is needed — the cite key is the source of truth, and the bibliography is recomputed on every build. This is the first knowledge-layer document on the quartobot pipeline; the preprocess script (docs/scripts/preprocess.py) handles the resolve-then-render dance.

Jergas, Hannah, and Christopher Baethge. 2015. “Quotation Accuracy in Medical Journal Articles-a Systematic Review and Meta-Analysis.” PeerJ 3 (October): e1364. https://doi.org/10.7717/peerj.1364.

Lacey, G de, C Record, and J Wade. 1985. “How Accurate Are Quotations and References in Medical Journals?” British Medical Journal (Clinical Research Ed.) 291 (6499): 884–86. https://doi.org/10.1136/bmj.291.6499.884.

Pavlovic, Vedrana, Tracey Weissgerber, Dejana Stanisavljevic, et al. 2021. “How Accurate Are Citations of Frequently Cited Papers in Biomedical Literature?” Clinical Science (London, England : 1979) 135 (5): 671–81. https://doi.org/10.1042/cs20201573.

Sauder, Matthew, Kevin Newsome, Israel Zagales, et al. 2022. “Evaluation of Citation Inaccuracies in Surgical Literature by Journal Type, Study Design, and Level of Evidence: Towards Safeguarding the Peer-Review Process.” The American Surgeon 88 (7): 1590–600. https://doi.org/10.1177/00031348211067993.

Wager, Elizabeth, and Philippa Middleton. 2008. “Technical Editing of Research Reports in Biomedical Journals.” Cochrane Database of Systematic Reviews 2010 (1). https://doi.org/10.1002/14651858.mr000002.pub3.