OmicIDX is an ecosystem that treats metadata from public genomics repositories as data. Metadata collection from source repositories updates occur regularly and processing using a purpose-build parsing library produces standard json format representations. The OmicIDX builder automates Bigquery and web-based API generation and updates. The OpenAPI-based web API enables performant language-agnostic search, retrieval, and analysis of OmicIDX data. Finally, data are augmented with natural language processing to produce Medical Subject Heading MeSH mapping and with heuristic text matching to map terms to ontologies.
Datasets currently included in OmicIDX include:
- SRA
- studies
- samples
- experiments
- runs
- dbGaP mapping from SRA
- Biosample
- samples
- projects
- GEO
- platforms
- series
- samples
Data are augmented with:
- MeSH
- ~200 Ontologies