Skip to contents

In some cases, instead of individual sample records (GSM) containing information regarding sample phenotypes, the GEO Series contains that information in an attached data table. And example is given by GSE3494 where there are two data tables with important information contained within them. Using getGEO with the standard parameters downloads the GSEMatrix file which, unfortunately, does not contain the information in the data tables. This function simply downloads the ``header'' information from the GSE record and parses out the data tables into R data.frames.

Usage

getGSEDataTables(GSE)

Arguments

GSE

The GSE identifier, such as ``GSE3494''.

Value

A list of data.frames.

See also

Author

Sean Davis <sdavis2@mail.nih.gov>

Examples


dfl = getGSEDataTables("GSE3494")
#> Rows: 251 Columns: 12
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (6): X1, X2, X5, X6, X7, X10
#> dbl (6): X3, X4, X8, X9, X11, X12
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 502 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (3): X1, X2, X3
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
lapply(dfl,head)
#> [[1]]
#> # A tibble: 6 × 12
#>   `INDEX (ID)` p53 seq…¹ p53 D…² DLDA …³ Elsto…⁴ ER st…⁵ PgR s…⁶ age a…⁷ tumor…⁸
#>   <chr>        <chr>       <dbl>   <dbl> <chr>   <chr>   <chr>     <dbl>   <dbl>
#> 1 X101B88      p53+            1       0 G3      ER-     PgR-         40      12
#> 2 X102B06      p53+            1       0 G3      ER+     PgR+         51      26
#> 3 X104B91      p53+            0       1 G3      ER+     PgR+         80      24
#> 4 X110B34      p53+            1       0 G2      ER+     PgR+         74      20
#> 5 X111B51      p53+            1       0 G3      ER+     PgR+         41      33
#> 6 X127B00      p53+            1       0 G3      ER+     PgR+         57      22
#> # … with 3 more variables: `Lymph node status` <chr>,
#> #   `DSS TIME (Disease-Specific Survival Time in years)` <dbl>,
#> #   `DSS EVENT (Disease-Specific Survival EVENT; 1=death from breast cancer, 0=alive or censored )` <dbl>,
#> #   and abbreviated variable names
#> #   ¹​`p53 seq mut status (p53+=mutant; p53-=wt)`,
#> #   ²​`p53 DLDA classifier result (0=wt-like, 1=mt-like)`,
#> #   ³​`DLDA error (1=yes, 0=no)`, ⁴​`Elston histologic grade`, ⁵​`ER status`, …
#> # ℹ Use `colnames()` to see all variable names
#> 
#> [[2]]
#> # A tibble: 6 × 3
#>   `GEO Sample Accession #` `Patient ID` `Affy platform`
#>   <chr>                    <chr>        <chr>          
#> 1 GSM79114                 X100B08      HG-U133A       
#> 2 GSM79115                 X101B88      HG-U133A       
#> 3 GSM79116                 X102B06      HG-U133A       
#> 4 GSM79117                 X103B41      HG-U133A       
#> 5 GSM79118                 X104B91      HG-U133A       
#> 6 GSM79119                 X105B13      HG-U133A       
#>