GEOquery’s job ends once your data is a Bioconductor object. This article is a map of where to go next — how the object you got from getGEO() connects to the rest of the Bioconductor ecosystem — rather than a tutorial for any one method. The goal is to save you the “I have an ExpressionSet, now what?” moment.
Know your object
What you do next depends on what GEOquery returned (see Understanding GEO data formats):
| You have | Typical source | Downstream entry point |
|---|---|---|
SummarizedExperiment (list) |
GSE Series Matrix (default), RNA-seq counts | limma / DESeq2 / edgeR |
ExpressionSet (list) |
GSE Series Matrix with returnType = "ExpressionSet"
|
limma |
SingleCellExperiment |
single-cell readers | scater / scran / OSCA |
GSE/GSM/GPL/GDS S4 |
SOFT parsing | accessors, then convert |
Microarray: limma
Microarray Series Matrix data arrives already processed (typically log-transformed, normalized intensities). The standard path is a linear model with limma, which accepts a matrix:
library(limma)
library(SummarizedExperiment)
se <- getGEO("GSE2553")[[1]] # SummarizedExperiment (default)
design <- model.matrix(~ group, data = colData(se))
fit <- eBayes(lmFit(assay(se), design))
topTable(fit, coef = 2)The hardest part is usually not the model but extracting clean grouping variables from colData(se) — GEO sample metadata is free text, so expect to parse characteristics_ch1 fields.
RNA-seq counts: DESeq2 / edgeR / limma-voom
For NCBI-computed RNA-seq counts (see the RNA-seq article), use a count-based model. The SummarizedExperiment GEOquery returns plugs directly into DESeq2:
library(DESeq2)
se <- getRNASeqData("GSE164073")
dds <- DESeqDataSet(se, design = ~ condition)
dds <- DESeq(dds)
results(dds)edgeR and limma-voom are equally good choices and consume the same count matrix.
Single cell: the OSCA stack
A SingleCellExperiment from the single-cell readers is the entry point to Bioconductor’s single-cell ecosystem — scater, scran, and the OSCA book for the full quality-control → normalization → clustering → annotation arc.
Converting and annotating
Two recurring needs:
-
Modernize a legacy result.
getGEO()returnsSummarizedExperimentby default now, but if you have an olderExpressionSet(or asked for one withreturnType = "ExpressionSet"), convert it without re-downloading:se <- as_SummarizedExperiment(getGEO("GSE2553", returnType = "ExpressionSet")[[1]]) Re-annotate features. GEO platform annotation can be dated. For mapping probe/gene identifiers, reach for the Bioconductor annotation infrastructure — AnnotationDbi, organism packages such as
org.Hs.eg.db, and biomaRt.
Reproducibility note
GEO records can change, and large downloads are slow, so cache deliberately: pass a stable destdir= to getGEO()/getGEOSuppFiles() and record the GEOquery and data versions (sessionInfo()) alongside your results.