Skip to contents

GEOquery’s job ends once your data is a Bioconductor object. This article is a map of where to go next — how the object you got from getGEO() connects to the rest of the Bioconductor ecosystem — rather than a tutorial for any one method. The goal is to save you the “I have an ExpressionSet, now what?” moment.

Know your object

What you do next depends on what GEOquery returned (see Understanding GEO data formats):

You have Typical source Downstream entry point
SummarizedExperiment (list) GSE Series Matrix (default), RNA-seq counts limma / DESeq2 / edgeR
ExpressionSet (list) GSE Series Matrix with returnType = "ExpressionSet" limma
SingleCellExperiment single-cell readers scater / scran / OSCA
GSE/GSM/GPL/GDS S4 SOFT parsing accessors, then convert

Microarray: limma

Microarray Series Matrix data arrives already processed (typically log-transformed, normalized intensities). The standard path is a linear model with limma, which accepts a matrix:

library(limma)
library(SummarizedExperiment)
se <- getGEO("GSE2553")[[1]]                 # SummarizedExperiment (default)
design <- model.matrix(~ group, data = colData(se))
fit <- eBayes(lmFit(assay(se), design))
topTable(fit, coef = 2)

The hardest part is usually not the model but extracting clean grouping variables from colData(se) — GEO sample metadata is free text, so expect to parse characteristics_ch1 fields.

RNA-seq counts: DESeq2 / edgeR / limma-voom

For NCBI-computed RNA-seq counts (see the RNA-seq article), use a count-based model. The SummarizedExperiment GEOquery returns plugs directly into DESeq2:

library(DESeq2)
se <- getRNASeqData("GSE164073")
dds <- DESeqDataSet(se, design = ~ condition)
dds <- DESeq(dds)
results(dds)

edgeR and limma-voom are equally good choices and consume the same count matrix.

Single cell: the OSCA stack

A SingleCellExperiment from the single-cell readers is the entry point to Bioconductor’s single-cell ecosystem — scater, scran, and the OSCA book for the full quality-control → normalization → clustering → annotation arc.

Converting and annotating

Two recurring needs:

  • Modernize a legacy result. getGEO() returns SummarizedExperiment by default now, but if you have an older ExpressionSet (or asked for one with returnType = "ExpressionSet"), convert it without re-downloading:

    se <- as_SummarizedExperiment(getGEO("GSE2553", returnType = "ExpressionSet")[[1]])
  • Re-annotate features. GEO platform annotation can be dated. For mapping probe/gene identifiers, reach for the Bioconductor annotation infrastructure — AnnotationDbi, organism packages such as org.Hs.eg.db, and biomaRt.

Reproducibility note

GEO records can change, and large downloads are slow, so cache deliberately: pass a stable destdir= to getGEO()/getGEOSuppFiles() and record the GEOquery and data versions (sessionInfo()) alongside your results.