Abstract

BigRNA is a large compendium of homogeneously processed public genomics datasets and accompanying metadata collected from the source repositories. BigRNAr connects R to the available data and provides R-based metadata search and retrieval.

Background

BigRNA is a large compendium of homogeneously processed public genomics datasets and accompanying metadata collected from the source repositories. BigRNAr connects R to the available data and provides R-based metadata search and retrieval.

Overview of BigRNA information flow and connection from R. Data are collected from public repositories like NCBI Short Read Archive (SRA) and processed using salmon. Metadata, including experimental protocols, sample details, and study descriptions and abstracts are simultaneously mined. The BigRNA API exposes the metadata via a GraphQL endpoint and the data as individual files for each processed sample. This package, BigRNAr, connects R to these resources.

Getting started

BiocManager::install("seandavi/BigRNA")

Create BigRNA connection object

## An object of class "BigRNAConnection"
## Slot "url":
## [1] "http://bigrna.cancerdatasci.org/"
## 
## Slot "bfc":
## class: BiocFileCache
## bfccache: /Users/sdavis2/Library/Caches/BigRNAr
## bfccount: 10
## For more information see: bfcinfo() or bfcquery()

Use Cases

Getting data

##                                                 node.key node.accession
## 1 results/10090/gencode/M19/SRX4147707/quant.genes.sf.gz     SRX4147707
## 2 results/10090/gencode/vM19/DRX000988/quant.genes.sf.gz      DRX000988
## 3 results/10090/gencode/vM19/DRX000989/quant.genes.sf.gz      DRX000989
## 4 results/10090/gencode/vM19/DRX000990/quant.genes.sf.gz      DRX000990
## 5 results/10090/gencode/vM19/DRX000991/quant.genes.sf.gz      DRX000991
## 6 results/10090/gencode/vM19/DRX001048/quant.genes.sf.gz      DRX001048
##       node.filename
## 1 quant.genes.sf.gz
## 2 quant.genes.sf.gz
## 3 quant.genes.sf.gz
## 4 quant.genes.sf.gz
## 5 quant.genes.sf.gz
## 6 quant.genes.sf.gz

Download data

fnames = sapply(df$node.key[1:10], function(path) {
  path = sub('^/','',path)
  datafile(bigrna, path)
})
head(fnames)
##                 results/10090/gencode/M19/SRX4147707/quant.genes.sf.gz 
## "/Users/sdavis2/Library/Caches/BigRNAr/c92253157fcb_quant.genes.sf.gz" 
##                 results/10090/gencode/vM19/DRX000988/quant.genes.sf.gz 
## "/Users/sdavis2/Library/Caches/BigRNAr/c9222879130a_quant.genes.sf.gz" 
##                 results/10090/gencode/vM19/DRX000989/quant.genes.sf.gz 
## "/Users/sdavis2/Library/Caches/BigRNAr/c92224d10a48_quant.genes.sf.gz" 
##                 results/10090/gencode/vM19/DRX000990/quant.genes.sf.gz 
## "/Users/sdavis2/Library/Caches/BigRNAr/c92217fa0fda_quant.genes.sf.gz" 
##                 results/10090/gencode/vM19/DRX000991/quant.genes.sf.gz 
## "/Users/sdavis2/Library/Caches/BigRNAr/c9222226bd82_quant.genes.sf.gz" 
##                 results/10090/gencode/vM19/DRX001048/quant.genes.sf.gz 
## "/Users/sdavis2/Library/Caches/BigRNAr/c9221d63b352_quant.genes.sf.gz"

From here, take a look at tximport or simply read as tsv. The format of the files will remain, but the details of paths, etc., is likely to change.

Session info

## R Under development (unstable) (2019-01-14 r75992)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.2
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] DT_0.5           BigRNAr_0.1.5    knitr_1.22       DiagrammeR_1.0.1
## [5] BiocStyle_2.11.0
## 
## loaded via a namespace (and not attached):
##  [1] viridis_0.5.1       httr_1.4.0          tidyr_0.8.3        
##  [4] bit64_0.9-7         jsonlite_1.6        viridisLite_0.3.0  
##  [7] shiny_1.3.0         assertthat_0.2.1    BiocManager_1.30.4 
## [10] BiocFileCache_1.7.7 blob_1.1.1          yaml_2.2.0         
## [13] pillar_1.3.1        RSQLite_2.1.1       backports_1.1.3    
## [16] glue_1.3.1          downloader_0.4      digest_0.6.18      
## [19] RColorBrewer_1.1-2  promises_1.0.1      colorspace_1.4-1   
## [22] htmltools_0.3.6     httpuv_1.5.1        plyr_1.8.4         
## [25] XML_3.98-1.19       pkgconfig_2.0.2     bookdown_0.9       
## [28] xtable_1.8-3        purrr_0.3.2         scales_1.0.0       
## [31] brew_1.0-6          later_0.8.0         tibble_2.1.1       
## [34] ggplot2_3.1.1       influenceR_0.1.0    lazyeval_0.2.2     
## [37] rgexf_0.15.3        mime_0.6            magrittr_1.5       
## [40] crayon_1.3.4        memoise_1.1.0       evaluate_0.13      
## [43] fs_1.2.7            MASS_7.3-51.4       xml2_1.2.0         
## [46] Rook_1.1-1          tools_3.6.0         hms_0.4.2          
## [49] stringr_1.4.0       munsell_0.5.0       compiler_3.6.0     
## [52] pkgdown_1.3.0       rlang_0.3.4         grid_3.6.0         
## [55] rstudioapi_0.10     rappdirs_0.3.1      htmlwidgets_1.3    
## [58] visNetwork_2.0.6    crosstalk_1.0.0     igraph_1.2.4       
## [61] rmarkdown_1.12      gtable_0.3.0        DBI_1.0.0          
## [64] roxygen2_6.1.1      curl_3.3            R6_2.4.0           
## [67] gridExtra_2.3       dplyr_0.8.0.1       bit_1.1-14         
## [70] commonmark_1.7      rprojroot_1.3-2     readr_1.3.1        
## [73] desc_1.2.0          stringi_1.4.3       Rcpp_1.0.1         
## [76] dbplyr_1.3.0        tidyselect_0.2.5    xfun_0.6