General notes about using containers
This talk presents a very quick overview of the Bioconductor project, focusing on its values of reproducibility, reuse, and openness.
This talk compares and contrasts four formats for data science and informatics education. The discussion will highlight some approaches that I have found useful to facilitate the training process. I also present some practical and simple tips that I …
In this talk, I motivate the need for cloud-based cancer data resourdces. I provide an overview of the NCI Genomic Data Commons and how to interact with it both interactively through a web portal as well as programmatically using the …
Apache Spark in a few words Apache Spark is a software and data science platform that is purpose-built for large- to massive-scale data processing. Spark supports processing of data in batch mode (run as a pipeline) or in interactive mode using command-line programming style or in popular notebook style of coding. While scala is the native language for Spark, language bindings exist for python, R, and Java as well.
Introduction The NCI Genomic Data Commons (GDC) is a reboot of the approach that NCI uses to manage and expose genomic and associated clinical and experimental metadata. I have been working on a Bioconductor package that interfaces with the GDC API to provide search and data retrieval from within R.
testing In the first of what will likely be a set of use cases for the GenomicDataCommons, I am going to address a question that came up on twitter from @sleight82