Lifecycle-aware evaluation of biomedical data ecosystems

Authors

Affiliations

Ethan M. Lange

Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus

Vikram Adithya Ganesh

Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus

Adriana Ivich

Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus

Vince Rubinetti

Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus

Casey Greene

Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus

Sean Davis

Departments of Biomedical Informatics and Medicine, University of Colorado Anschutz Medical Campus

Published

May 20, 2026

Abstract

Public biomedical data resources have become central scientific infrastructure, but evaluation of these investments is fragmented: each funded program tracks different metrics in different ways, preventing portfolio-level comparison and evidence-driven funding decisions. We construct a structured approach built on three complementary prioritization frameworks: public value (user engagement, citations, funded grants), scientific data quality (metadata completeness, FAIR compliance, data dictionaries), and operations and finance (infrastructure reliability, cost sustainability). The relative weight of each framework shifts predictably across a resource’s lifecycle, and uniform metric panels across phases misallocate funding as a result. Drawing on the NIH Common Fund Data Ecosystem, we describe a portfolio-wide dashboard that operationalizes the framework across 19 programs in near real time.

Keywords

biomedical data ecosystems, research evaluation, FAIR, bibliometrics, NIH Common Fund, data sharing

Public biomedical data resources have become central infrastructure for biology, medicine, and the data sciences. NIH alone has invested billions over the past decade in initiatives that include The Cancer Genome Atlas [1], the Genotype-Tissue Expression project [2], the All of Us Research Program [3], and the 19 programs of the Common Fund. Whether any given resource is succeeding goes mostly unanswered. Each funded program tracks its own metrics in its own way, which blocks portfolio-level comparison and the evidence-driven funding decisions that comparison enables.

We argue that three prioritization frameworks long familiar in product strategy — public value, scientific quality, and operations and finance — map cleanly onto research data resources. Their relative weight shifts predictably across a resource’s lifecycle. Treating the frameworks as uniformly important across all phases is the reason portfolio decisions go wrong: early-phase investment in scientific quality looks expensive only when the comparison metric is mid-phase public value.

The NIH Common Fund Data Ecosystem (CFDE) (see Glossary) [4] is a cross-cutting initiative that harmonizes metadata and tooling across the 19 Common Fund programs. The Council of Councils Working Group identified “facilitation of new scientific discovery” as the chief metric of CFDE success ⁱ. We use the CFDE as motivating case study and operational testbed, and describe the Integration and Coordination Center’s (ICC) first portfolio-wide dashboard, which operationalizes the model in near real time.

Three frameworks, one lifecycle

The frameworks overlap and sometimes conflict (more scientifically impactful datasets are typically more expensive to create and maintain), and weighted composite scoring such as Multi-Criteria Decision Analysis (MCDA) can balance them when needed [5,6]. Their relative weight shifts predictably across a resource’s lifecycle. In the early term of infrastructure-building and data collection, scientific quality and operations and finance dominate. In the mid term of active use within a funded period, all three matter, with public value taking on increasing importance. In the late, post-funding term, operations and finance and public value become the primary lenses (Figure 1). Uniform metric panels across phases misallocate funding in predictable ways; lifecycle-aware weighting is the operating principle that fixes this, not a refinement on top.

Figure 1: Schematic relative priority of the three metric and evaluation frameworks across the project lifecycle. Lower priority does **not** imply unimportant.

The public value framework captures outputs and impacts: user engagement, downloads, funded grants, publications, and downstream scientific influence. The scientific quality framework measures reusability and long-term value, guided by the FAIR principles (Findable, Accessible, Interoperable, Reusable) [7,8]. The operations and finance framework covers infrastructure reliability and cost sustainability, including robustness as funding sunsets.

Public value

The public value framework covers user behavior and scientific impact. Website engagement metrics (page views, time on page, actions/triggers, new vs. returning users) are easy to collect via platforms such as Google Analytics or Matomo, but they capture interest rather than actual data use. More direct evidence comes from downloads and compute jobs; cloud-hosted datasets let analysis activity be measured rather than inferred. For scientific software, the same framing extends to GitHub-style engagement markers such as stars, forks, and issues. Limitations are persistent: bots skew counts, VPN use confounds new-versus-returning tracking, and downloads cannot distinguish active use from passive acquisition. As AI-driven and agent-based traffic on data portals grows — Cloudflare data show training-purpose AI crawler traffic rising rapidly across 2024 and 2025 [9/] — the bot signal will need to be partitioned into noise to remove and demand to interpret.

Citations from PubMed, Scopus, Google Scholar, Clarivate, and OpenAlex give stronger proxies for scientific impact. The original GTEx paper [2] had accumulated 8,630 citations as of January 2026. Composite scores weighted by secondary citations capture cumulative downstream impact that simple counts miss, and citation networks reveal cross-disciplinary collaborations enabled by data sharing; successful sharing also produces a citation boost for the original data generators [10]. Citation counts lag actual use by months to years [11], and journal-scoring alternatives that weight by citing-paper properties better capture subject-specific repository impact than raw counts [12]. The deeper problem is the citation gap: fewer than 30% of secondary analyses formally cite their source data [13]. Standard bibliometrics therefore systematically understate reuse, and programs that lean on citation counts as their primary impact key performance indicator (KPI) are measuring reach rather than influence. Closing the gap requires data ecosystems to mine accession numbers (e.g., GEO accession numbers, NCBI BioProject IDs) from publication full text rather than rely on citation databases. A bibliometric analysis of NeuroMorpho.Org found that shared neural morphology data sustained a reuse rate constant for sixteen years and effectively doubled peer-reviewed discoveries in the field [13].

User-centered measures of perceived value offer complementary insight. Work integrating the Information System Success Model, the Technology Acceptance Model, and Consumer Perceived Value Theory identifies four entropy-weighted metrics that explain the majority of perceived value: number of datasets (breadth of coverage), data timeliness (currency for emergent challenges), search comprehensiveness (ability to return all relevant records across federated sources), and system responsiveness (loading times and API stability) [14]. These can be collected through structured user surveys.

The long-term measure of public value is practice change — a shift from hypothesis-driven research toward data-driven discovery, and from evidence-based practice toward practice-based evidence [15]. Concrete indicators include the cloud-versus-local analysis ratio, cross-disciplinary co-authorship across Common Fund programs, and the geographic and institutional breadth of the user base, especially participation by early-career investigators from under-resourced institutions. Until practice change is measurable, every other public-value metric is at best a leading indicator.

Scientific quality

The scientific quality framework addresses whether data can be found, understood, and reused. Poor data quality and poor annotation are the leading barriers to reuse [16]. Metadata spans three grain levels: study-level (name, contributors, dates, funding), data-level (storage format, data types, measurement platforms, biospecimen handling and collection conditions), and dataset-level (sample sizes, variable counts, demographic summaries, QC statistics). Summary metrics across these levels — metadata completeness, standardized metadata, QC compliance, available data dictionary, quality data dictionary — are more tractable for portfolio evaluation than cataloging every study-specific variable. The Standard Evaluation Framework for Large Health Care Data identifies 49 attributes across three domains (collection process, data, and data use) for assessing fitness for purpose [17]. Two of those attributes are particularly load-bearing for federated ecosystems: representativeness, the extent to which a dataset generalizes beyond its source population (critical for avoiding both Type 1 and Type 2 errors in scientific conclusions drawn from the data, as the European-ancestry bias in human-genetics studies has repeatedly demonstrated [18,19]), and linkability, the extent to which data integrate with external sources — a paramount concern for an ecosystem whose mission is cross-program query.

For scientific software, quality follows from open-source development practices: README and LICENSE files, automated testing [20], and inclusion in curated ecosystems like Bioconductor [21]. The ELIXIR “Top 10 Metrics for Life Science Software Good Practices” provides a prioritized scorecard spanning version control, discoverability, automated builds, test data, and documentation [22]. “Minimal Reimplementation” — preferring established libraries over custom code — is particularly relevant for federated ecosystems, where ad hoc code multiplies technical debt across the portfolio. PIsCO (Performance Indicators framework for Collection of bioinformatics resource metrics) automates server-side collection and visualization of these metrics so they can flow directly into evaluation dashboards [22].

A structured assessment of overall quality comes from FAIR maturity models. The shift from first-generation FAIR metrics (manually assessed) to second-generation maturity indicators (automated and machine-measurable) [8] is essential for the CFDE, where the volume of metadata prevents manual audit. The FAIRplus Dataset Maturity (FAIRplus-DSM) model ⁱⁱ, developed for the life sciences, defines six levels: Level 0 (no identifiers, local storage) through Level 5 (provenance in cross-community languages, enterprise-grade lifecycle management), with intermediate levels covering persistent identifiers, indexed metadata, community reporting standards such as MIAME, and linked-data semantic representation. The Research Data Alliance (RDA) further classifies FAIR indicators as Essential (globally unique identifiers, open protocols), Important (machine-understandable knowledge representation), or Useful (automated agent access), giving evaluators a tiered roadmap. Automated tools such as the FAIR Evaluator test resources at scale via Compliance Tests, making continuous FAIR compliance monitoring feasible for large portfolios [8]. Manual FAIR audits do not scale to CFDE-sized portfolios; automated second-generation indicators are the difference between a stated FAIR commitment and an inspectable one.

Operations and finance

The operations and finance framework addresses infrastructure reliability and cost sustainability. Server-level metrics (uptime, page load time, latency, server errors, client errors) and compute-level metrics (download speed, cpu/gpu usage, memory usage, client queue times, job run times) are easy to capture but carry day-to-day variability from load, job size, and client-side network conditions, so trend monitoring is more informative than point-in-time values. For open-source projects, GitHub repository analytics provide an additional operational window: development velocity (commit frequency, pull-request review latency, merge frequency), repository-health composites, accumulating signals of technical debt (stale pull requests, abandoned branches, unresolved issues), and contributor dynamics that separate core-team commits from external community contributions and so register whether training and outreach are broadening adoption.

The Council of Councils identified sustainability as the “critical issue” for the CFDE, primarily because the programs that generate data are finite in duration ⁱ. Funding cost is the most accessible metric, but a complete picture decomposes it into sample collection, sample measurement, QC, data storage, data preservation (beyond the grant period), computing hardware, server maintenance, cloud-based computing, and personnel. The National Academies six-step framework for forecasting biomedical information-resource costs identifies cost-driver categories — data content, metadata, labor, infrastructure, compliance, governance — and notes that labor (curation and user support) consistently represents the single largest cost element ⁱⁱⁱ. Compliance costs in particular (HIPAA, FISMA, regulatory governance for human-subjects data such as Kids First and GTEx) are often underestimated in initial budgeting, and cloud-computing costs add provider-specific, usage-dependent volatility that resists multi-year forecasting [23].

The harder sustainability question is the break-even point for data sharing. Curation costs are borne by data generators while reuse benefits accrue elsewhere — an asymmetry that means aggregate gains can fail to cover aggregate costs even when both are positive. A recent scoping review of open-science economics catalogs substantial measured value across efficiency, innovation, and growth from open data and software, though most studies estimate aggregate return on investment rather than the marginal break-even at the dataset level [24]. This argues for selective investment in high-value datasets rather than uniform support across the portfolio. Cost per data reuse event — total curation and hosting cost divided by documented reuse instances — gives a concrete, comparable KPI for portfolio-level decisions, separating direct benefits (economies of scale in data generation and curation) from extended benefits (downstream scientific discovery and clinical outcomes enabled by reuse).

Table 1 catalogs the metrics across the three frameworks, with collection difficulty, evaluation value, lifecycle phase, and interpretive notes.

Table 1: Summary of metrics across the three frameworks, organized by difficulty of collection, value for evaluation, relevant lifecycle phase, and notes on interpretation.

Metric	Difficulty of Collection	How Collected	Value	Lifecycle Phase	Notes
User Behavior Framework
page views	Easy	Google Analytics or Matomo or similar platforms	Low	Mid	A minimum indicator of interest.
time on page	Easy	Google Analytics or Matomo or similar platforms	Low	Mid	A minimum indicator of interest.
actions/triggers	Easy	Google Analytics or Matomo or similar platforms	Medium	Mid	A moderate indicator of interest.
priority link clicks	Easy	Google Analytics or Matomo or similar platforms	Medium	Mid	A moderate indicator of interest.
new vs returning users	Easy	Google Analytics or Matomo or similar platforms	Medium	Mid	A moderate indicator of continuing interest.
downloads	Easy	Google Analytics or Matomo or similar platforms or Server-side logs	Medium-High	Mid	A strong indicator of interest. Downloading not equivalent to usage.
compute jobs	Easy	Compute scheduler logs or cloud environment logs	High	Mid	A direct indicator of interest.
tool downloads	Easy	Google Analytics or Matomo or similar platforms or server-side logs	Medium-High	Mid	A strong indicator of interest.
tool usage	Easy	Google Analytics or Matomo or similar platformsor API logs	High	Mid	A direct indicator of interest.
trend downloads	Medium	Google Analytics or Matomo or similar platforms or server-side logs, BI tools for visualization	Medium-High	Mid–Late	A strong indicator of change in interest over time.
trend compute jobs	Medium	Compute scheduler logs or cloud environment logs, BI tools for visualization	High	Mid–Late	A direct indicator of change in interest over time.
number grants	Hard	NIH RePORTER, Self-reporting	High	Mid–Late	Relevant grant data not publicly available.
number presentations	Hard	Self-reporting	Medium	Mid–Late	Data difficult to track and often not available in sufficient detail.
number citations	Easy	Google Scholar, PubMed, Scopus, Clarivate, OpenAlex, or other similar platforms	High	Mid–Late	A direct indicator of data value. Can miss downstream cumulative value.
number secondary citations	Medium	Google Scholar, PubMed, Scopus, Clarivate, OpenAlex, or other similar platforms	High	Late	An indirect measure of “downstream” value of initial publications.
citations + secondary citations	Medium	Google Scholar, PubMed, Scopus, Clarivate, OpenAlex, or citation network analysis	High	Late	A measure of cumulative impact over time of study data.
trend citations	Medium	Google Scholar, PubMed, Scopus, Clarivate, OpenAlex, or other similar platforms, BI tools	High	Late	Measures change in impact of data over time. Subject to time lag in publications.
data citation rate	Hard	Full-text search for accession numbers, Methods section mining	High	Mid–Late	<30% of secondary analyses formally cite datasets. Multi-pronged tracking required.
perceived value (user surveys)	Medium	Structured surveys (ISSM/TAM framework)	High	Mid–Late	Entropy-weighted: dataset breadth, timeliness, search comprehensiveness, responsiveness.
cloud adoption rate	Medium	Cloud workspace logs vs. download logs	Medium-High	Mid–Late	Proxy for computational paradigm shift (cloud vs. local analysis).
cross-disciplinary collaboration	Hard	Co-authorship network analysis	High	Late	Measures new partnerships fostered across Common Fund programs.
user diversity (geographic/institutional)	Medium	User registration data, IAM logs	Medium-High	Mid–Late	Measures democratization of access, especially for under-resourced institutions.

Scientific Quality Framework
metadata completeness	Medium	Metadata validator, Schema validator	Medium-High	Early	Critical for drawing initial interest in study.
standardized metadata	Hard	Requires human-in-the-loop evaluation.	High	Early	Study-specific requirements.
quality-control (QC) compliance	Hard	Requires human-in-the-loop evaluation.	High	Early	Study-specific requirements.
data uniqueness	Hard	Requires human-in-the-loop evaluation.	Medium	Early	Subjective measure.
FAIR compliance	Hard	Requires human-in-the-loop evaluation.	High	All	Subjective measure.
available data dictionary	Medium	Checks on repository or documentation	Medium	Early	An essential component for a study to have. Does not guarantee quality.
quality data dictionary	Hard	Requires human-in-the-loop evaluation.	High	Early	A metric of quality of the data dictionary.
representativeness	Hard	Demographic and population metadata analysis	High	Early	Whether dataset generalizes beyond source population. Critical for avoiding false conclusions.
linkability	Hard	Schema and identifier compatibility assessment	High	All	Extent to which data integrates with external sources. Paramount for federated ecosystems.
software good practices score	Medium	ELIXIR Top 10 framework; automated collection via PIsCO	High	All	Covers version control, discoverability, automated builds, test data, documentation.

*Operations and Finance* Framework
uptime	Easy	Application Performance Monitoring (APM) tools	Medium-Low	Mid–Late	Measure of server stability.
page load time	Easy	Application Performance Monitoring (APM) tools	Medium	Mid–Late	Measure of server performance.
latency	Easy	Application Performance Monitoring (APM) tools or Google Analytics ( not real-time)	Medium	Mid–Late	Measure of server performance.
server errors	Easy	Server-side logs or Application Performance Monitoring (APM) tools	Medium	Mid–Late	Measure of server stability and performance.
client errors	Medium	Application logs or Application Performance Monitoring (APM) tools	Low	Mid–Late	Measure of errors on client end. Possible not addressable.
download speed	Easy	Server-side logs or Application Performance Monitoring (APM) tools	Low-Medium	Mid–Late	Download times can be function of client connection speeds.
cpu/gpu usage	Easy	Application Performance Monitoring (APM) tools	High	Mid–Late	Important to evaluate performance and additional needs.
memory usage	Easy	Application Performance Monitoring (APM) tools	High	Mid–Late	Important to evaluate performance and additional needs.
number of users	Easy	Google Analytics or IAM logs for granular details (if available)	Medium	Mid–Late	Indirect measure of computational load/burden.
client queue times	Easy	Application Performance Monitoring (APM) tools or Cloud/infrastructure logs	Medium-High	Mid–Late	Long queue times can result in less usage.
job run times	Easy	Application Performance Monitoring (APM) tools or Cloud/infrastructure compute logs	Medium	Mid–Late	Excessive long job run times can result in less usage.
funding cost	Easy	NIH RePORTER	High	All	Total cost of study. Critical element in evaluating study feasibility and sustainability.
sample collection costs	Medium		Medium	Early	Cost typically associated with early study period. Cost does not impact sustainability.
sample measurement costs	Medium		Medium	Early	Cost typically associated with early study period. Cost does not impact sustainability.
QC costs	Medium		Medium	Early	Cost typically associated with early study period. Cost does not impact sustainability.
data storage costs	Hard		High	All	Costs and requirements can fluctuate over study.
data preservation costs	Hard		High	Late	Costs and requirements can fluctuate over study. Difficult to assess years in advance.
computing hardware costs	Medium		Medium	Early	Costs often associated with initial study period, but replacement costs are possible.
server maintenance costs	Hard		High	Mid–Late	Costs can fluctuate. Major source of cost in sustainability period.
cloud-based computing costs	Hard	Cloud Cost Explorer tools or	High	Mid–Late	Costs are highly variable and subject to constant change (for both host and clients). Especially difficult to assess years in advance.
personnel costs	Medium		High	All	Can be measured in money or time. Requires committed time with relevant expertise.

Implementation in practice

The CFDE Integration and Coordination Center (ICC) operates a portfolio-wide dashboard that demonstrates this evaluation model is implementable. The ICC currently collects a prioritized subset of the metrics described above (Table 2), drawing on NIH-provided structural data (NIH CFDE Website, NIH RePORTER) for funding and project context, PubMed and NIH iCite (including the Relative Citation Ratio (RCR)) for publication and citation impact, GitHub for repository use, quality, and operations across the open-source software components, and Google Analytics for website engagement. Project teams onboard by labeling GitHub repositories with their NIH core project number (for example, “R01AG123456”) and granting the ICC a Google Analytics service account, with NIH program staff supporting the trust and access conversation.

Table 2: Data sources and metrics collected by the CFDE ICC dashboard as of January 2026. This is the subset of the catalog above selected for feasibility and informativeness across the portfolio.

Source	Metrics collected
NIH CFDE Website	Funding Opportunities, ID, Activity Codes
NIH RePORTER	Project name, funding amounts, project descriptions, project start/end dates, PI names and affiliations, project keywords
NIH RePORTER (self-reported)	Publications, presentations, grants
PubMed	Publication details, citations
NIH iCite	Citation details and metrics (e.g., Relative Citation Ratio)
GitHub	Repository use (stars, forks, etc.)
GitHub	Repository quality (README, LICENSE, etc.)
GitHub	Repository operations (commits, pull requests, issues, etc.)
GitHub	Repository narrative summaries via AI
Google Analytics	Website engagement metrics (page views, new vs. returning users, top countries, etc.)

The dashboard supports project- and portfolio-level views with ORCID-based authentication and role-based access, so teams see their own data while program managers see the full portfolio (Figure 2). A cross-center Metrics Working Group, including representatives from CFDE centers and NIH program staff, reviews coverage and plans new metrics as the portfolio grows.

Figure 2: Screenshot of the CFDE ICC dashboard showing project- and portfolio-level metrics. Authentication and authorization controls follow CFDE community consensus on private versus public access.

Pilots underway target the gaps the frameworks identify. Full-text mining of accession numbers in publications addresses the data-citation gap directly. Event-organizer surveys, AI-generated narrative summaries of GitHub repositories, and structured-interview augmentation via generative AI [25] add the qualitative dimension that usage statistics cannot capture. The implementation demonstrates that lifecycle-aware portfolio evaluation is operationally tractable at NIH scale, in near real time, with manageable burden on individual projects. The metric set itself is not final, and the working-group structure described below is the mechanism by which it continues to evolve.

Concluding remarks

No framework operates in isolation, and quantitative metrics alone cannot capture the full picture. The NIH Common Fund’s “360-degree view” of program performance combines quantitative metrics with the contextual understanding only qualitative inquiry can provide ^iv. Mixed-method designs — usage statistics and citation data alongside structured user interviews, focus groups, and surveys — are essential for understanding why resources succeed or fail, not just what is happening [26]. Two designs suit ecosystem evaluation: sequential explanatory (quantitative data first, then qualitative inquiry to explain observed patterns) and convergent (both collected simultaneously and triangulated at interpretation) [27]. The ICC dashboard already integrates the quantitative side; pilot user surveys and AI-augmented interviews are starting to add the qualitative.

Logic models that connect inputs (funding, data, personnel) to activities (harmonization, training, curation), outputs (portals, trained users, tools), and outcomes (publications, funded grants, practice change) ^v give evaluators the scaffolding needed to identify which elements work. For the CFDE, two questions matter: to what extent are the elements of the logic model being implemented? and what are the critical paths through the model that lead to success? [28]. These models should be living documents, revised as programs transition from pilot to full implementation.

The harder challenge is closing the “know-do gap” — the distance between the information metrics provide and actual decision-making [29]. Evidence from healthcare quality improvement suggests that repeated, structured measurement itself drives behavioral change among data producers and managers, a Hawthorne effect [30]. Making metrics visible and regularly updated, as the ICC dashboard does, creates the sustained attention that incrementally improves data quality and FAIR compliance across the portfolio. NIH already requires regular FAIRness assessments and usage reporting ^vi, and the 2023 Data Management and Sharing Policy (DMS Policy) creates a structural opportunity to establish pre-policy baselines for sharing and reuse — a rare chance to test rigorously whether mandates translate into measurable ecosystem improvements. This window will not last; without pre-policy baselines captured now, the policy’s effect on sharing and reuse can be inferred but not measured.

The CFDE’s Metrics Working Group plays the role that GA4GH’s “Driver Projects” model has demonstrated elsewhere ^vii: iterating on a small priority set of metrics with NIH program staff and cross-center representatives before expanding portfolio-wide. What this paper proposes is not a final metric set but a shared evaluation language for federated biomedical data ecosystems — one that is lifecycle-aware, mixed-method by design, and operationally tractable at NIH scale. Open questions for the field are catalogued in the box (see “Outstanding questions”).

Acknowledgments

This work was supported by the National Institutes of Health under grant U54OD036472, the CFDE Integration and Coordination Center (ICC). We thank the CFDE ICC Metrics Working Group for their ongoing contributions to the development and implementation of these evaluation frameworks and metrics, including representatives from CFDE programs and centers as well as NIH program staff. We also thank the CFDE project teams for their collaboration in sharing data and providing feedback on the dashboard and metrics collection processes. Thank you to David Mayer and Faisal Alquaddoomi for software engineering support.

Declaration of interests

The authors declare no competing interests.

Resources

ⁱ Council of Councils CFDE Working Group, Final Report: https://dpcpsi.nih.gov/sites/default/files/Day2-1225PM-Final-Report-CFDE-CoC-WG-HorwitzWilder_.pdf

ⁱⁱ FAIRplus Dataset Maturity (FAIRplus-DSM) Indicators: https://fairplus.github.io/Data-Maturity/docs/Indicators

ⁱⁱⁱ National Academies, Open Science by Design: Realizing a Vision for 21st Century Research: https://www.ncbi.nlm.nih.gov/books/NBK562708

^iv NIH Common Fund Data Ecosystem: https://commonfund.nih.gov/dataecosystem

^v W. K. Kellogg Foundation, Logic Model Development Guide: https://www.wkkf.org/resource-directory/resources/2004/01/logic-model-development-guide

^vi NIH NOT-OD-21-013 (Final NIH Policy for Data Management and Sharing): https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html

^vii GA4GH Strategic Roadmap: https://www.ga4gh.org/about-us/strategic-road-map

Author contributions

S.D., E.L., and V.G. jointly developed the conceptualization for this Opinion and wrote the original draft of the manuscript. All authors contributed to the critical review and editing of the final version. All authors have read and approved the final manuscript.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work the author(s) used Claude from Anthropic in order to improve readability and clarity. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the published article.

Glossary

CFDE (Common Fund Data Ecosystem): NIH cross-cutting initiative that harmonizes metadata and tooling across 19 Common Fund programs so researchers can find and reuse data across the portfolio.

DMS Policy (NIH Data Management and Sharing Policy): The 2023 NIH policy requiring grantees to plan for data sharing, which establishes a pre-policy baseline against which sharing and reuse improvements can be measured.

ELIXIR: European life-sciences research infrastructure that publishes a ten-metric framework for evaluating bioinformatics software, covering version control, discoverability, automated builds, test data, and documentation.

FAIR: Findable, Accessible, Interoperable, Reusable — a widely adopted set of principles for making research data reliably reusable.

FAIRplus-DSM (FAIRplus Dataset Maturity model): Six-level maturity scale that operationalizes the FAIR principles for life-sciences data, from Level 0 (no identifiers, local storage) to Level 5 (enterprise-grade lifecycle management with cross-community provenance).

GA4GH (Global Alliance for Genomics and Health): International standards organization whose “Driver Projects” model — small pilots that road-test new standards before portfolio-wide adoption — informs the CFDE’s Metrics Working Group.

ICC (Integration and Coordination Center): CFDE center responsible for cross-portfolio coordination, evaluation, and the centralized metrics dashboard described here.

MCDA (Multi-Criteria Decision Analysis): A decision-making approach that combines multiple weighted performance dimensions into a single composite score.

PIsCO (Performance Indicators framework for Collection of bioinformatics resource metrics): Server-side framework for automated collection and visualization of software-quality metrics, allowing software health data to feed an evaluation dashboard directly.

RCR (Relative Citation Ratio): NIH iCite metric that benchmarks an article’s citation rate against field-specific expectations.

RDA (Research Data Alliance): International community that classifies FAIR indicators by priority (Essential, Important, Useful), providing a tiered roadmap for incremental FAIR improvement.

Significance statement

Public biomedical data ecosystems are now core scientific infrastructure, but each funded program tracks its own metrics differently, so portfolio-level comparison is effectively impossible. We argue that three product-strategy frameworks — public value, scientific quality, operations and finance — map onto research data resources, that their weight shifts predictably across a resource’s lifecycle, and that ignoring this shift misallocates funding. The NIH Common Fund Data Ecosystem operationalizes the model across 19 programs in real time.

Outstanding Questions

At what break-even point does the curation cost of a shared dataset get recovered by downstream reuse, and how should that point inform the decision to invest in further sharing versus letting a resource sunset?
As the data-citation gap persists, what KPI should replace raw citation counts as the primary measure of data-resource impact, and how do we collect it at portfolio scale without burdening individual projects?
How should cross-disciplinary co-authorship arising from a shared dataset be weighted relative to direct citation when evaluating a resource’s scientific influence?
What is the right cadence for re-assessing the FAIR maturity of an active data resource, who owns the re-assessment, and how do we keep the process tractable as portfolios grow?
As AI-driven and agent-based traffic on data portals continues to grow, how do we partition it into noise to remove versus signal to interpret as evidence of resource value?
What governance structures keep federated evaluation frameworks honest as they evolve, so that metrics continue to serve funders’ strategic questions rather than drifting toward data managers’ reporting convenience?

Key References

Wilkinson MD et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 2016 [7]. The foundational definition of the principles that anchor the scientific-quality framework throughout this paper.
Wilkinson MD et al. A design framework and exemplar metrics for FAIRness. Scientific Data, 2018 [8]. Operationalizes FAIR as machine-measurable indicators, which is what makes automated FAIR-compliance monitoring tractable at the portfolio scale we describe.
Tedersoo L et al. Data sharing practices and data availability upon request differ across scientific disciplines [31]. Provides the bibliometric evidence that fewer than 30% of secondary analyses formally cite source data, the core empirical basis for our argument that citation-count KPIs systematically understate reuse.
The GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nature Genetics, 2013 [2]. The paradigmatic example of a high-reuse public biomedical data resource (8,630 citations as of January 2026), used here to anchor the public-value framework with a concrete case.
Centers for Disease Control and Prevention. Standard Evaluation Framework for Large Health Care Data [17]. The 49-attribute fitness-for-purpose framework whose representativeness and linkability attributes are particularly load-bearing for federated ecosystems like the CFDE.

References

et al. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–20

Lonsdale, J. et al. (2013) The Genotype-Tissue Expression (GTEx) project. Nat Genet 45, 580–585

(2019) The “All of Us” Research Program. N Engl J Med 381, 668–676

Charbonneau, A.L. et al. (2022) Making Common Fund data more findable: catalyzing a data ecosystem. GigaScience 11

Keeney, R.L. and Raiffa, H. (1976) Decisions with multiple objectives: preferences and value tradeoffs, Wiley

Eloff, J.H.P., ed. (2001) Advances in information security management & small systems security: IFIP TC11 WG11.1/WG11.2 Eighth Annual Working Conference on Information Security Management & Small Systems Security, September 27-28, 2001, Las Vegas, Nevada, USA, Kluwer Academic Publishers

Wilkinson, M.D. et al. (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3

Wilkinson, M.D. et al. (2018) A design framework and exemplar metrics for FAIRness. Sci Data 5

(2025) A deeper look at AI crawlers: breaking down traffic by purpose and industry. The Cloudflare Blog. [Online]. Available: https://blog.cloudflare.com/ai-crawler-traffic-by-purpose-and-industry/. [Accessed: 20-May-2026]

10.

Colavizza, G. et al. (2024) An analysis of the effects of sharing research data, code, and preprints on citations. PLoS ONE 19, e0311493

11.

Hicks, D. et al. (2015) Bibliometrics: The Leiden Manifesto for research metrics. Nature 520, 429–431

12.

Agarwal, A. et al. (2016) Bibliometrics: tracking research impact by selecting the appropriate metrics. Asian J Androl 18, 296

13.

Emissah, H. et al. (2023) Bibliometric analysis of neuroscience publications quantifies the impact of data sharing. Bioinformatics 39

14.

Wang, D. and Liu, Y. (2025) Medical Science Data Value Evaluation Model: Mixed Methods Study. JMIR Med Inform 13, e63544–e63544

15.

Verspoor, K. and Martin-Sanchez, F. (2014) Big Data in Medicine Is Driving Big Changes. Yearb Med Inform 23, 14–20

16.

Tenopir, C. et al. (2020) Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide. PLoS ONE 15, e0229003

17.

El Burai Felix, S. et al. (2024) A Standard Framework for Evaluating Large Health Care Data and Related Resources. MMWR Suppl. 73, 1–13

18.

Sirugo, G. et al. (2019) The Missing Diversity in Human Genetic Studies. Cell 177, 26–31

19.

Popejoy, A.B. and Fullerton, S.M. (2016) Genomics is failing on diversity. Nature 538, 161–164

20.

Beaulieu-Jones, B.K. and Greene, C.S. (2017) Reproducibility of computational workflows is automated using continuous analysis. Nat Biotechnol 35, 342–346

21.

Gentleman, R.C. et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5

22.

Artaza, H. et al. (2016) Top 10 metrics for life science software good practices. F1000Res 5, 2000

23.

Kahn, M.G. et al. (2021) Migrating a research data warehouse to a public cloud: challenges and opportunities. Journal of the American Medical Informatics Association 29, 592–600

24.

Tsipouri, L. et al. (2025) The economic impact of open science: a scoping review. R. Soc. Open Sci. 12

25.

Zhang, H. et al. (2025) Harnessing the Power of AI in Qualitative Research: Role Assignment, Engagement, and User Perceptions of AI-Generated Follow-Up Questions in Semi-Structured InterviewsarXiv

26.

Palinkas, L.A. et al. (2013) Purposeful Sampling for Qualitative Data Collection and Analysis in Mixed Method Implementation Research. Adm Policy Ment Health 42, 533–544

27.

Creswell, J.W. and Inoue, M. (2024) A process for conducting mixed methods data analysis. J of Gen and Family Med 26, 4–11

28.

Wu, H. et al. (2019) Using logic model and visualization to conduct portfolio evaluation. Eval Program Plann 74, 69–75

29.

Pablos-Mendez, A. et al. (2005) Knowledge translation in global health. Bull World Health Organ 83, 723

30.

Leonard, K.L. and Masatu, M.C. (2017) Changing health care provider performance through measurement. Social Science & Medicine 181, 54–65

31.

Emissah, H. et al. (2023) Bibliometric analysis of neuroscience publications quantifies the impact of data sharingopenRxiv