Bioconductor

Bioconductor: Increasing the Value of Public Data with Software and Data Engineering

In this talk, I provide a high-level overview of the Bioconductor and then give some examples of tooling that connects Bioconductor to several public genomic data resources, including NCBI and NCI cancer data.

Building R Binary Packages for Linux

Background One of the challenges of producing a performant build environment for linux, such as what might be used to have developers test software in identical environments, is the need to compile R packages from source on linux. If, however, one had an identical set of installed libraries, kernel version, compiler, etc., we could use binary packages in linux as well. Docker provides just such a shareable and identical environment for linux.

Bioconductor: Tools for interpreting high-throughput biological data

In this talk, I give a high-level overview of the Bioconductor project.

ATAC-Seq workshop

This workshop introduces ATAC-Seq, quality control approaches, isolating nucleosome compartments, and profile plots and heatmaps.

Statistical Methods in Functional Genomics

In a series of talks and exercises, I cover introduction to R, Bioconductor, genomic ranges, container classes, annotation of genes and regions, ATAC-Seq data analysis, and an introduction to machine learning.

Bioconductor: software for interpreting high-throughput biological data

This talk presents a very quick overview of the Bioconductor project, focusing on its values of reproducibility, reuse, and openness.

Orchestrating a community-developed computational workshop and accompanying training materials

The importance of bioinformatics, computational biology, and data science in biomedical research continues to grow, driving a need for effective instruction and education. A workshop setting, with lectures and guided hands-on tutorials, is a common …

Infrastructure-as-Code: Building the Bioconductor Conference AMI With Packer

One of the main features of the annual Bioconductor Conference is the proportion of time spent working with code in the form of workshops. To support these workshops, we ask workshop presenters to supply Rmarkdown materials which we collate into workshop materials. Using literate programming approaches like Rmarkdown ensures that the workflows are self-consistent and work as expected. In addition to the Rmarkdown workshop materials, we also need a consistent computing environment that can support reasonably large computation, provide high-performance network and file system access, and is essentially unlimited in scale (we expect to have >150 participants, each with his/her own machine).

A computable Bioconductor build report

Bioconductor spends a substantial amount of effort to build its catalog of software each day. Reporting of these results is critical for developers, users, and project leaders to understand the software “health” of the project. The Bioconductor build reports are generally available as html pages that are navigable with bookmarks and link out to detailed reports of errors, etc. However, the build reports are not readily computable, so mining the reports, automated processing by developers, and learning about failure modes automatically is challenging.

Matched tumor/normal pairs--a use case for the GenomicDataCommons Bioconductor package

Introduction The NCI Genomic Data Commons (GDC) is a reboot of the approach that NCI uses to manage and expose genomic and associated clinical and experimental metadata. I have been working on a Bioconductor package that interfaces with the GDC API to provide search and data retrieval from within R. testing In the first of what will likely be a set of use cases for the GenomicDataCommons, I am going to address a question that came up on twitter from @sleight82