GEOquery is the bridge between the NCBI Gene Expression Omnibus (GEO) and Bioconductor: it downloads and parses GEO records into Bioconductor objects. This page is a short quick-start and an index; the in-depth, narrative documentation lives in the articles listed below.
Install
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("GEOquery")Quick start
library(GEOquery)
# A GSE via the fast Series Matrix path -> a list of SummarizedExperiment,
# one per platform. (Pass returnType = "ExpressionSet" for the legacy class.)
gse <- getGEO("GSE2553")
se <- gse[[1]]
assay(se) # expression matrix
colData(se) # sample metadata
rowData(se) # feature annotation
# Other entity types parse to GEOquery's S4 classes:
getGEO("GSM11805") # a sample
getGEO("GPL96") # a platform
getGEO("GDS507") # a curated dataset
# See what supplementary files a study has, without downloading:
getGEOSuppFiles("GSE63137", fetch_files = FALSE)In-depth articles
The articles go beyond the how to the why — the structure of GEO, the file formats, and how a GEOquery object connects to downstream Bioconductor workflows:
-
Understanding GEO data formats — the four entity types, SOFT vs. Series Matrix, why
getGEO()returns different classes, andExpressionSetvs.SummarizedExperiment. - RNA-seq quantifications from GEO — NCBI’s uniformly-computed counts and how to retrieve them.
-
Single-cell data from GEO — why single-cell data lives in supplementary files, and the inspect → decide → load workflow into a
SingleCellExperiment. - From GEO to downstream analysis — taking a GEOquery object into limma / DESeq2 / edgeR / the single-cell ecosystem, with links to the relevant packages.
Getting help
- Usage questions: the Bioconductor support site, tagged
geoquery. - Bugs and feature requests: the issue tracker — please include a GEO accession and
sessionInfo().