In this talk, I provide a high-level overview of the Bioconductor and then give some examples of tooling that connects Bioconductor to several public genomic data resources, including NCBI and NCI cancer data.
Background One of the challenges of producing a performant build environment for linux, such as what might be used to have developers test software in identical environments, is the need to compile R packages from source on linux. If, however, one had an identical set of installed libraries, kernel version, compiler, etc., we could use binary packages in linux as well. Docker provides just such a shareable and identical environment for linux.
In this talk, I give a high-level overview of the Bioconductor project.
This workshop introduces ATAC-Seq, quality control approaches, isolating nucleosome compartments, and profile plots and heatmaps.
In a series of talks and exercises, I cover introduction to R, Bioconductor, genomic ranges, container classes, annotation of genes and regions, ATAC-Seq data analysis, and an introduction to machine learning.
This talk presents a very quick overview of the Bioconductor project, focusing on its values of reproducibility, reuse, and openness.
The importance of bioinformatics, computational biology, and data science in biomedical research continues to grow, driving a need for effective instruction and education. A workshop setting, with lectures and guided hands-on tutorials, is a common …
One of the main features of the annual Bioconductor Conference is the proportion of time spent working with code in the form of workshops. To support these workshops, we ask workshop presenters to supply Rmarkdown materials which we collate into workshop materials. Using literate programming approaches like Rmarkdown ensures that the workflows are self-consistent and work as expected.
In addition to the Rmarkdown workshop materials, we also need a consistent computing environment that can support reasonably large computation, provide high-performance network and file system access, and is essentially unlimited in scale (we expect to have >150 participants, each with his/her own machine).
Bioconductor spends a substantial amount of effort to build its catalog of software each day. Reporting of these results is critical for developers, users, and project leaders to understand the software “health” of the project.
The Bioconductor build reports are generally available as html pages that are navigable with bookmarks and link out to detailed reports of errors, etc. However, the build reports are not readily computable, so mining the reports, automated processing by developers, and learning about failure modes automatically is challenging.
Introduction The NCI Genomic Data Commons (GDC) is a reboot of the approach that NCI uses to manage and expose genomic and associated clinical and experimental metadata. I have been working on a Bioconductor package that interfaces with the GDC API to provide search and data retrieval from within R.
testing In the first of what will likely be a set of use cases for the GenomicDataCommons, I am going to address a question that came up on twitter from @sleight82