Bioconductor: orchestrating high-throughput biological data analysis

Abstract

Progress in biotechnology is continually leading to new types of data, resulting in data sets that are rapidly increasing in volume, resolution and diversity. The promise of unprecedented advances in our understanding of biological systems and in medicine is challenged by complexity and volume of data also challenge scientists’ ability to analyze them. Meeting this challenge requires continuous improvements in analytical methods and capable, usable software tools implementing them. Bioconductor is a well-established open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 1473 interoperable packages contributed by a large, diverse community of scientists. These packages undergo formal initial review and continuous automated testing. Each package includes documentation and working example use cases. Bioconductor supports many types of high-throughput sequencing data (including DNA, RNA, chromatin immunoprecipitation, Hi-C, methylomes and ribosome profiling) and associated annotation resources; contains mature facilities for microarray analysis; and covers proteomic, metabolomic, flow cytometry, quantitative imaging, cheminformatic and other high-throughput data. Bioconductor package interoperability enables the rapid creation of workflows combining and integrating multiple data types and tools for statistical inference, regression, network analysis, machine learning and visualization at all stages of a project from data generation to publication. A large and growing community of researchers and users contribute to ongoing development, online support, and education. The influence of the project is evidenced by more than 250,000 downloads per year and tens of thousands of citations in the literature. I will present an overview of the project for prospective users and contributors.

Date
Dec 4, 2017 12:00 AM
Event
NCI Center of Excellence in Cancer Biology and Genomics
Location
Shady Grove, MD, USA

Click in the window below and hit “f” to go to full screen.

Progress in biotechnology is continually leading to new types of data, resulting in data sets that are rapidly increasing in volume, resolution and diversity. The promise of unprecedented advances in our understanding of biological systems and in medicine is challenged by complexity and volume of data also challenge scientists’ ability to analyze them. Meeting this challenge requires continuous improvements in analytical methods and capable, usable software tools implementing them. Bioconductor is a well-established open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 1473 interoperable packages contributed by a large, diverse community of scientists. These packages undergo formal initial review and continuous automated testing. Each package includes documentation and working example use cases. Bioconductor supports many types of high-throughput sequencing data (including DNA, RNA, chromatin immunoprecipitation, Hi-C, methylomes and ribosome profiling) and associated annotation resources; contains mature facilities for microarray analysis; and covers proteomic, metabolomic, flow cytometry, quantitative imaging, cheminformatic and other high-throughput data. Bioconductor package interoperability enables the rapid creation of workflows combining and integrating multiple data types and tools for statistical inference, regression, network analysis, machine learning and visualization at all stages of a project from data generation to publication. A large and growing community of researchers and users contribute to ongoing development, online support, and education. The influence of the project is evidenced by more than 250,000 downloads per year and tens of thousands of citations in the literature. I will present an overview of the project for prospective users and contributors.

I presented this talk as a representative of the Bioconductor community:

DAVIS SR1, CAREY VJ2, WALDRON L3, CULHANE A4, LAWRENCE M5, IRIZZARY RA6, HUBER W7, GENTLEMAN R8, HANSEN KD9, AND MORGAN M10

  1. Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
  2. Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA.
  3. School of Urban Public Health at Hunter College, City University of New York, New York, New York, USA.
  4. Department of Biostatistics, Dana-Farber Cancer Institute, Boston, MA, USA.
  5. Genentech, South San Francisco, CA, USA.
  6. Harvard School of Public Health, Boston, MA, USA.
  7. European Molecular Biology Laboratory, Heidelberg, Germany.
  8. 23andMe, Mountain View, CA, USA.
  9. Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA.
  10. Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, USA.
Professor of Medicine

My interests include biomedical data science, open data, genomics, and cancer research.

comments powered by Disqus