A public genomic metadata index

OmicIDX is an ecosystem that treats metadata from public genomics repositories as data. Metadata collection from source repositories updates occur regularly and processing using a purpose-build parsing library produces standard json format representations. The OmicIDX builder automates Bigquery and web-based API generation and updates. The OpenAPI-based web API enables performant language-agnostic search, retrieval, and analysis of OmicIDX data. Finally, data are augmented with natural language processing to produce Medical Subject Heading MeSH mapping and with heuristic text matching to map terms to ontologies.

Datasets currently included in OmicIDX include:

  • SRA
    • studies
    • samples
    • experiments
    • runs
  • dbGaP mapping from SRA
  • Biosample
    • samples
    • projects
  • GEO
    • platforms
    • series
    • samples

Data are augmented with:

  • MeSH
  • ~200 Ontologies
Professor of Medicine

My interests include biomedical data science, open data, genomics, and cancer research.

comments powered by Disqus