vignettes/getting_started.Rmd
getting_started.Rmd
Abstract
BigRNA is a large compendium of homogeneously processed public genomics datasets and accompanying metadata collected from the source repositories. BigRNAr connects R to the available data and provides R-based metadata search and retrieval.BigRNA is a large compendium of homogeneously processed public genomics datasets and accompanying metadata collected from the source repositories. BigRNAr connects R to the available data and provides R-based metadata search and retrieval.
library(knitr)
opts_chunk$set(message=FALSE, cache=FALSE)
library(BigRNAr)
bigrna = BigRNAConnection()
bigrna
## An object of class "BigRNAConnection"
## Slot "url":
## [1] "http://bigrna.cancerdatasci.org/"
##
## Slot "bfc":
## class: BiocFileCache
## bfccache: /Users/sdavis2/Library/Caches/BigRNAr
## bfccount: 10
## For more information see: bfcinfo() or bfcquery()
studyFullTextSearch = '
query studyFullTextSearch (
$match: String!=""
$cursor: Cursor=null
) {
allStudies(
filter: {textsearchableIndexCol: {matches: $match}}
after: $cursor
) {
edges {
node {
accession
bioproject
gse
abstract
alias
attributes
brokerName
centerName
description
identifiers
studyType
title
xrefs
status
updated
published
received
visibility
replacedBy
metadataByExptStudyAccession {
nodes {
sampAccession
exptAccession
sampTitle
exptLibraryStrategy
exptLibrarySelection
}
totalCount
}
}
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}
'
x=gqlQuery(bigrna, studyFullTextSearch,
variables=list(match='colon & cancer'),
handler=dataframe_handler)
names(x)
## [1] "edges" "pageInfo" "totalCount"
fileListQuery = '
query getFiles($cursor: Cursor=null) {
allBigrnaFiles(
filter: {filename: {equalTo: "quant.genes.sf.gz"}}
after: $cursor
) {
edges {
node {
key
accession
filename
}
}
pageInfo {
hasNextPage
endCursor
}
}
}
'
x=gqlQuery(bigrna, fileListQuery,
handler=dataframe_handler)
df = x$edges
maxN = 10000
# get about 10000 sample file metadata.
while(x$pageInfo$hasNextPage & nrow(df)<maxN) {
x = gqlQuery(bigrna, fileListQuery,
variables=list(cursor=x$pageInfo$endCursor),
handler=dataframe_handler)
df = dplyr::bind_rows(df, x$edges)
}
head(df)
## node.key node.accession
## 1 results/10090/gencode/M19/SRX4147707/quant.genes.sf.gz SRX4147707
## 2 results/10090/gencode/vM19/DRX000988/quant.genes.sf.gz DRX000988
## 3 results/10090/gencode/vM19/DRX000989/quant.genes.sf.gz DRX000989
## 4 results/10090/gencode/vM19/DRX000990/quant.genes.sf.gz DRX000990
## 5 results/10090/gencode/vM19/DRX000991/quant.genes.sf.gz DRX000991
## 6 results/10090/gencode/vM19/DRX001048/quant.genes.sf.gz DRX001048
## node.filename
## 1 quant.genes.sf.gz
## 2 quant.genes.sf.gz
## 3 quant.genes.sf.gz
## 4 quant.genes.sf.gz
## 5 quant.genes.sf.gz
## 6 quant.genes.sf.gz
fnames = sapply(df$node.key[1:10], function(path) {
path = sub('^/','',path)
datafile(bigrna, path)
})
## results/10090/gencode/M19/SRX4147707/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c92253157fcb_quant.genes.sf.gz"
## results/10090/gencode/vM19/DRX000988/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c9222879130a_quant.genes.sf.gz"
## results/10090/gencode/vM19/DRX000989/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c92224d10a48_quant.genes.sf.gz"
## results/10090/gencode/vM19/DRX000990/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c92217fa0fda_quant.genes.sf.gz"
## results/10090/gencode/vM19/DRX000991/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c9222226bd82_quant.genes.sf.gz"
## results/10090/gencode/vM19/DRX001048/quant.genes.sf.gz
## "/Users/sdavis2/Library/Caches/BigRNAr/c9221d63b352_quant.genes.sf.gz"
From here, take a look at tximport or simply read as tsv. The format of the files will remain, but the details of paths, etc., is likely to change.
## R Under development (unstable) (2019-01-14 r75992)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] DT_0.5 BigRNAr_0.1.5 knitr_1.22 DiagrammeR_1.0.1
## [5] BiocStyle_2.11.0
##
## loaded via a namespace (and not attached):
## [1] viridis_0.5.1 httr_1.4.0 tidyr_0.8.3
## [4] bit64_0.9-7 jsonlite_1.6 viridisLite_0.3.0
## [7] shiny_1.3.0 assertthat_0.2.1 BiocManager_1.30.4
## [10] BiocFileCache_1.7.7 blob_1.1.1 yaml_2.2.0
## [13] pillar_1.3.1 RSQLite_2.1.1 backports_1.1.3
## [16] glue_1.3.1 downloader_0.4 digest_0.6.18
## [19] RColorBrewer_1.1-2 promises_1.0.1 colorspace_1.4-1
## [22] htmltools_0.3.6 httpuv_1.5.1 plyr_1.8.4
## [25] XML_3.98-1.19 pkgconfig_2.0.2 bookdown_0.9
## [28] xtable_1.8-3 purrr_0.3.2 scales_1.0.0
## [31] brew_1.0-6 later_0.8.0 tibble_2.1.1
## [34] ggplot2_3.1.1 influenceR_0.1.0 lazyeval_0.2.2
## [37] rgexf_0.15.3 mime_0.6 magrittr_1.5
## [40] crayon_1.3.4 memoise_1.1.0 evaluate_0.13
## [43] fs_1.2.7 MASS_7.3-51.4 xml2_1.2.0
## [46] Rook_1.1-1 tools_3.6.0 hms_0.4.2
## [49] stringr_1.4.0 munsell_0.5.0 compiler_3.6.0
## [52] pkgdown_1.3.0 rlang_0.3.4 grid_3.6.0
## [55] rstudioapi_0.10 rappdirs_0.3.1 htmlwidgets_1.3
## [58] visNetwork_2.0.6 crosstalk_1.0.0 igraph_1.2.4
## [61] rmarkdown_1.12 gtable_0.3.0 DBI_1.0.0
## [64] roxygen2_6.1.1 curl_3.3 R6_2.4.0
## [67] gridExtra_2.3 dplyr_0.8.0.1 bit_1.1-14
## [70] commonmark_1.7 rprojroot_1.3-2 readr_1.3.1
## [73] desc_1.2.0 stringi_1.4.3 Rcpp_1.0.1
## [76] dbplyr_1.3.0 tidyselect_0.2.5 xfun_0.6