vignettes/getting_started.Rmd
getting_started.Rmd
Abstract
BigRNA is a large compendium of homogeneously processed public genomics datasets and accompanying metadata collected from the source repositories. BigRNAr connects R to the available data and provides R-based metadata search and retrieval.BigRNA is a large compendium of homogeneously processed public genomics datasets and accompanying metadata collected from the source repositories. BigRNAr connects R to the available data and provides R-based metadata search and retrieval.
Overview of BigRNA information flow and connection from R. Data are collected from public repositories like NCBI Short Read Archive (SRA) and processed using salmon. Metadata, including experimental protocols, sample details, and study descriptions and abstracts are simultaneously mined. The BigRNA API exposes the metadata via a GraphQL endpoint and the data as individual files for each processed sample. This package, BigRNAr, connects R to these resources.
library(knitr)
opts_chunk$set(message=FALSE, cache=FALSE)
library(BigRNAr)
bigrna = BigRNAConnection()
bigrna
## An object of class "BigRNAConnection"
## Slot "url":
## [1] "http://bigrna.cancerdatasci.org/"
##
## Slot "bfc":
## class: BiocFileCache
## bfccache: /Users/sdavis2/Library/Caches/BigRNAr
## bfccount: 10
## For more information see: bfcinfo() or bfcquery()
studyFullTextSearch = '
query studyFullTextSearch (
$match: String!=""
$cursor: Cursor=null
) {
allStudies(
filter: {textsearchableIndexCol: {matches: $match}}
after: $cursor
) {
edges {
node {
accession
bioproject
gse
abstract
alias
attributes
brokerName
centerName
description
identifiers
studyType
title
xrefs
status
updated
published
received
visibility
replacedBy
metadataByExptStudyAccession {
nodes {
sampAccession
exptAccession
sampTitle
exptLibraryStrategy
exptLibrarySelection
}
totalCount
}
}
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}
'
x=gqlQuery(bigrna, studyFullTextSearch,
variables=list(match='colon & cancer'),
handler=dataframe_handler)
names(x)
## [1] "edges" "pageInfo" "totalCount"
fileListQuery = '
query getFiles($cursor: Cursor=null) {
allBigrnaFiles(
filter: {filename: {equalTo: "quant.genes.sf.gz"}}
after: $cursor
) {
edges {
node {
key
accession
filename
}
}
pageInfo {
hasNextPage
endCursor
}
}
}
'
x=gqlQuery(bigrna, fileListQuery,
handler=dataframe_handler)
df = x$edges
maxN = 10000
# get about 10000 sample file metadata.
while(x$pageInfo$hasNextPage & nrow(df)<maxN) {
x = gqlQuery(bigrna, fileListQuery,
variables=list(cursor=x$pageInfo$endCursor),
handler=dataframe_handler)
df = dplyr::bind_rows(df, x$edges)
}
head(df)
## node.key node.accession
## 1 results/10090/gencode/M19/SRX4147707/quant.genes.sf.gz SRX4147707
## 2 results/10090/gencode/vM19/DRX000988/quant.genes.sf.gz DRX000988
## 3 results/10090/gencode/vM19/DRX000989/quant.genes.sf.gz DRX000989
## 4 results/10090/gencode/vM19/DRX000990/quant.genes.sf.gz DRX000990
## 5 results/10090/gencode/vM19/DRX000991/quant.genes.sf.gz DRX000991
## 6 results/10090/gencode/vM19/DRX001048/quant.genes.sf.gz DRX001048
## node.filename
## 1 quant.genes.sf.gz
## 2 quant.genes.sf.gz
## 3 quant.genes.sf.gz
## 4 quant.genes.sf.gz
## 5 quant.genes.sf.gz
## 6 quant.genes.sf.gz
fnames = sapply(df$node.key[1:10], function(path) {
path = sub('^/','',path)
datafile(bigrna, path)
})
## results/10090/gencode/M19/SRX4147707/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c92253157fcb_quant.genes.sf.gz"
## results/10090/gencode/vM19/DRX000988/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c9222879130a_quant.genes.sf.gz"
## results/10090/gencode/vM19/DRX000989/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c92224d10a48_quant.genes.sf.gz"
## results/10090/gencode/vM19/DRX000990/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c92217fa0fda_quant.genes.sf.gz"
## results/10090/gencode/vM19/DRX000991/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c9222226bd82_quant.genes.sf.gz"
## results/10090/gencode/vM19/DRX001048/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c9221d63b352_quant.genes.sf.gz"
From here, take a look at tximport or simply read as tsv. The format of the files will remain, but the details of paths, etc., is likely to change.
## R Under development (unstable) (2019-01-14 r75992)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] DT_0.5 BigRNAr_0.1.5 knitr_1.22 DiagrammeR_1.0.1
## [5] BiocStyle_2.11.0
##
## loaded via a namespace (and not attached):
## [1] viridis_0.5.1 httr_1.4.0 tidyr_0.8.3
## [4] bit64_0.9-7 jsonlite_1.6 viridisLite_0.3.0
## [7] shiny_1.3.0 assertthat_0.2.1 BiocManager_1.30.4
## [10] BiocFileCache_1.7.7 blob_1.1.1 yaml_2.2.0
## [13] pillar_1.3.1 RSQLite_2.1.1 backports_1.1.3
## [16] glue_1.3.1 downloader_0.4 digest_0.6.18
## [19] RColorBrewer_1.1-2 promises_1.0.1 colorspace_1.4-1
## [22] htmltools_0.3.6 httpuv_1.5.1 plyr_1.8.4
## [25] XML_3.98-1.19 pkgconfig_2.0.2 bookdown_0.9
## [28] xtable_1.8-3 purrr_0.3.2 scales_1.0.0
## [31] brew_1.0-6 later_0.8.0 tibble_2.1.1
## [34] ggplot2_3.1.1 influenceR_0.1.0 lazyeval_0.2.2
## [37] rgexf_0.15.3 mime_0.6 magrittr_1.5
## [40] crayon_1.3.4 memoise_1.1.0 evaluate_0.13
## [43] fs_1.2.7 MASS_7.3-51.4 xml2_1.2.0
## [46] Rook_1.1-1 tools_3.6.0 hms_0.4.2
## [49] stringr_1.4.0 munsell_0.5.0 compiler_3.6.0
## [52] pkgdown_1.3.0 rlang_0.3.4 grid_3.6.0
## [55] rstudioapi_0.10 rappdirs_0.3.1 htmlwidgets_1.3
## [58] visNetwork_2.0.6 crosstalk_1.0.0 igraph_1.2.4
## [61] rmarkdown_1.12 gtable_0.3.0 DBI_1.0.0
## [64] roxygen2_6.1.1 curl_3.3 R6_2.4.0
## [67] gridExtra_2.3 dplyr_0.8.0.1 bit_1.1-14
## [70] commonmark_1.7 rprojroot_1.3-2 readr_1.3.1
## [73] desc_1.2.0 stringi_1.4.3 Rcpp_1.0.1
## [76] dbplyr_1.3.0 tidyselect_0.2.5 xfun_0.6