Grant writing evidence: what's actually known about what wins

Last updated: 2026-05-17

Synthesis

A scriptorium grant skill should not pretend that the evidence base for “what wins a grant” is anywhere as clean as the evidence base for “what wins a published paper.” The grant literature splits into three strata, each with different epistemic status. First, the empirical peer-review-quality literature — studies of inter-reviewer agreement, predictive validity of scores, institutional bias — provides strong evidence that grant review is noisier and less discriminating than reviewers themselves believe. Second, the funding-outcomes data — NIH RePORT statistics, success-rate trends, ESI-specific paylines — establishes the structural difficulty of funding and the importance of resubmission. Third, practitioner wisdom — Russell & Morrison, Robert Porter’s RD-specialist materials, agency-specific writing guides — codifies craft norms that have not, for the most part, been subjected to large-N empirical testing.

Two findings from the empirical literature should anchor any honest grant skill. Pier et al. (2018) showed that two reviewers evaluating the same NIH application produce ratings about as similar to each other as ratings of two different applications — that is, almost no agreement. Lauer et al. (2016) showed that within the funded range, percentile scores barely predict subsequent citation productivity (AUC ≈ 0.54). The implication is uncomfortable: peer review can usually identify clearly weak applications and clearly strong ones, but discrimination among the middle is closer to luck than to skill.

This does not mean “grant writing doesn’t matter.” It means the signal-to-noise ratio is low, and writing exists to maximize the signal applicants can control — clarity, structural fit to review criteria, demonstrated feasibility, anticipation of reviewer objections — while accepting that outcomes will retain a stochastic component beyond any author’s control.

Evidence

Pier et al. (2018), PNAS 115(12):2952–2957. “Low agreement among reviewers evaluating the same NIH grant applications.” A controlled experiment: 43 experienced NIH peer reviewers each rated the same set of 25 R01 applications (a mix of previously-funded and previously-revised-then-funded grants). Headline finding: two randomly selected ratings for the same application were on average just as similar to each other as two randomly selected ratings for different applications. The outcome of review depended more on which reviewer evaluated the grant than on the grant itself. The authors are explicit: this is a fundamental challenge to the validity of grant peer review as currently practiced.¹

Lauer et al. (2016), eLife 5:e13323. Within funded grants (102,740 grants analyzed), percentile rank was a poor predictor of citation productivity. AUC = 0.54 (95% CI 0.53–0.54) for “above median citation productivity.” An AUC of 0.5 is random chance. Within the funded range, scores barely beat coin-flip prediction of which grants will be most productive.²

Lauer (2015), NEJM 373:1925–1927. “Reviewing peer review at the NIH.” A senior NIH administrator’s framing of the predictive-validity problem: peer review must discriminate at the percentile-payline boundary even if it cannot discriminate within the funded pool. The decision-quality question is different from the predictive-validity question.³

Eblen et al. (2016). See nih-significance-patterns for full detail. The 123,700-application analysis established that Approach correlates most strongly with Overall Impact among the five criteria, followed by Significance and Innovation. This is the empirical backbone for the practitioner adage that an aims page must demonstrate feasibility, not merely assert importance.

Halo effect and institutional prestige bias. Multiple studies document that reviewers’ impressions are influenced by institutional prestige and by prior recognition. A 2024 eLife meta-research paper found that blinding the initial review of Beckman Young Investigator applications reduced institutional prestige bias measurably, though it did not change gender outcome rates. Researchers from prestigious institutions were more likely to attract reviewer attention and to progress in the application process before blinding.⁴ In computer science conferences, prestige bias has been quantified using natural experiments with double-blind review (2021 ICLR data; PLOS ONE 2022).⁵ NIH has cited reducing reputational bias as a motivation for the Simplified Review Framework (NIH press release, 2023).

Success rates and ESI dynamics. NIH R01 success rates have trended downward in recent years; ESI-specific rates have been volatile, with reports of drops from ~30% to ~18% over recent two-year windows at some institutes.⁶ Most institutes maintain preferential paylines or pickup policies for ESIs, partially buffering the headline trend. Resubmission outcomes are uneven: the NIH FAQs make explicit that “some who were not discussed on first submission are funded on resubmission; some who were discussed on first submission do worse on resubmission,” and there is no reliable a priori prediction of resubmission outcome from initial score alone.

The “perfect score → fund” simplification. Practitioner narrative sometimes treats funding as deterministic in scores: “if your percentile is below the payline, you’re funded.” The actual decision process is messier. Program staff have substantial discretion; priority scores feed into institute-level decisions that incorporate portfolio balance, ESI policies, programmatic priorities, and post-review changes (e.g., budget adjustments, just-in-time documentation). Scriptorium should not promise score thresholds as funding promises.

Practitioner wisdom (label clearly). The dominant practitioner sources are:

Russell & Morrison, Grant Application Writer’s Workbook — step-by-step prescriptive workbook. Practitioner consensus, not empirical study.
Robert Porter (RD Specialist materials) — widely-referenced practitioner writing on Approach sections, reviewer psychology, and resubmission strategy. Drawn from experience as a research development professional; not empirical research.
Friedman, Career Development Awards — guidance on K-series awards.
Agency-specific guides (NIAID, NINDS, NIDDK practitioner pages; NSF program officer essays).

These are valuable as craft guides but should not be cited as evidence-based in the same register as Pier, Lauer, or Eblen.

How this informs scriptorium

Honesty about noise. Any grant skill — specific-aims, grant-reviewer-simulation, aims-page-critique — should produce outputs that acknowledge the empirical finding that grant outcomes are noisier than reviewers themselves believe. This is not defeatism; it is calibration. Outputs should help authors maximize controllable signal, not promise outcome guarantees.

Approach over Significance. Given Eblen’s finding that Approach is the strongest single predictor of Overall Impact, a grant-reviewer-simulation persona must include a rigorous “Approach scrutinizer” — a reviewer who probes feasibility, preliminary data, pitfalls, alternative strategies, statistical power, and timeline. Aims-page critique should treat the approach-related sentences of the page (typically the 2–4 sentence descriptions under each aim) as the highest-leverage content.

Bias-aware critique. Scriptorium should resist amplifying the institutional-prestige bias that empirical work has documented. A grant-reviewer-simulation persona should not be encouraged to react more favorably to “investigator at MIT” framing than to “investigator at [less-recognized institution]” framing — the bias is documented as a real-world reviewer behavior, but reproducing it in scriptorium critiques would compound the problem rather than help the author counteract it.

Resubmission posture. A resubmission-strategy skill (future) should be empirical-evidence-aware: the FAQ-level finding that resubmission outcomes are unpredictable from initial score alone means the skill cannot recommend resubmission strategy based on score; it must focus on substantive responses to specific reviewer critiques.

Practitioner-wisdom labeling in outputs. When a critique skill draws on practitioner conventions (the four-paragraph aims page, the bulleted-aims pattern, the “long-term goal → objective → central hypothesis” cadence), the output should signal that the recommendation is drawn from craft tradition, not from peer-reviewed evidence. The DESIGN.md principle of inspectability extends to provenance.

Open questions / weak evidence

Pier et al. used a constructed experimental setting (offline re-rating of previously-reviewed grants). The extent to which real-time study-section dynamics — discussion, anchoring, primary reviewer influence — restores or worsens inter-rater consistency is partially studied (see reviewer-archetypes-grants) but not fully resolved.
Most peer-review evidence is from NIH. NSF, DOE, NSF-style agencies, and private foundations have less-studied review dynamics; the Pier/Lauer findings should not be extrapolated wholesale.
Practitioner-wisdom interventions (e.g., “follow the four-paragraph aims structure”) have not been subjected to randomized comparison against alternatives. The craft norms may or may not be causally helpful; we know only that successful applicants tend to follow them.

References

Pier EL, Brauer M, Filut A, et al. Low agreement among reviewers evaluating the same NIH grant applications. PNAS. 2018;115(12):2952–2957. doi:10.1073/pnas.1714379115. PMID: 29507248. ↩
Lauer MS, Danthi NS, Kaltman J, Wu C. NIH peer review percentile scores are poorly predictive of grant productivity. eLife. 2016;5:e13323. doi:10.7554/eLife.13323. PMID: 26880623. ↩
Lauer MS, Nakamura R. Reviewing peer review at the NIH. N Engl J Med. 2015;373(20):1925–1927. doi:10.1056/NEJMp1507427. ↩
Severin A, Strinzel M, Egger M, Domingo M, Barros T. Blinding reduces institutional prestige bias during initial review of applications for a young investigator award. eLife. 2024;12:RP92339. doi:10.7554/eLife.92339. ↩
Manzoor A, Shah NB. Uncovering latent biases in text: Method and application to peer review. PLOS ONE. 2022;17(2):e0264131. doi:10.1371/journal.pone.0264131. ↩
NIH RePORT. Success rates statistics. https://report.nih.gov/funding/nih-budget-and-spending-data-past-fiscal-years/success-rates . Institute-specific ESI paylines are published on each institute’s funding page (e.g., NHLBI, NIDDK, NCI, NINDS); recent trend data aggregated by practitioner trackers including the WriteDIT NIH Paylines & Resources page (https://writedit.wordpress.com/nih-paylines-resources/). The ~30% → ~18% ESI two-year trend referenced in the body is consistent with FY2023–FY2025 figures reported in those sources but specific institute attribution should be checked against the institute’s own current page at time of skill execution. ↩