Professor of Medicine

University of Colorado Anschutz School of Medicine

Biography

I received my B.S.E. in Mechanical and Aerospace Engineering from Princeton University, my M.D. and Ph.D. from the University of Pittsburgh in the Schools of Medicine and Public Health (thesis advisor, Dr. Daniel Weeks, Human Genetics), respectively. Childrens Hospital and Regional Medical Center was where I completed my residency in Pediatrics, followed by pediatric hematology/oncology training in the Johns Hopkins School of Medicine/National Cancer Institute joint fellowship.

Interests

Biomedical Informatics and Data Science
Software Development
Open Data
Reproducible Research
Education
Cloud and Distributed Computing

Education

MD, 1999

University of Pittsburgh, Pittsburgh, PA
PhD, Human Genetics, 1997

University of Pittsburgh, Pittsburgh, PA
BS, Mechanical and Aerospace Engineering, 1993

Princeton University, Princeton, NJ

Experience

Professor of Medicine

University of Colorado Anschutz School of Medicine

January 2021 – Present Aurora, Colorado

I am the Associate Director for Informatics and Data Science for the Comprehensive Cancer Center and the Deputy Director for the Center for Health AI.

Staff and Senior Associate Scientist

Center for Cancer Research, National Cancer Institute, National Institutes of Health

January 2008 – December 2020 Bethesda, MD

Clinical Fellow, Pediatric Hematology/Oncology

Johns Hopkins University & National Human Genome Research Institute, NIH

July 2002 – December 2007 Bethesda, MD

Resident, Pediatrics

Childrens Hospital and Regional Medical Center

July 1999 – June 2002 Seattle, WA

Recent Posts

Build and deploy an NCBI GEO metadata fetch API

Build, containerize, and then deploy a simple serverless web API that returns json-formated metadata for any GEO accession.

Last updated on Jun 4, 2020 5 min read cloud, bioinformatics

Build and deploy an NCBI GEO metadata fetch API

Building R Binary Packages for Linux

Background One of the challenges of producing a performant build environment for linux, such as what might be used to have developers test software in identical environments, is the need to compile R packages from source on linux. If, however, one had an identical set of installed libraries, kernel version, compiler, etc., we could use binary packages in linux as well. Docker provides just such a shareable and identical environment for linux.

Last updated on Jun 4, 2020 4 min read Bioconductor, R

Building R Binary Packages for Linux

Experimenting with Github Actions

GitHub actions allow flexible and potentially complicated actions that comprise workflows that respond to events on Github. Continuous integration, messaging Slack, greeting new contributors, deploying applications, and many other templates are ready for customization and integration into any repo.

Last updated on Oct 11, 2019 5 min read IT, programming

Experimenting with Github Actions

OmicIDX on BigQuery

Availability: This ipython notebook is available at https://github.com/seandavi/omicidx_examples. OmicIDX is a project to democratize access to omics metadata. As the sizes of omics repositories have grown into the millions of available samples, thinking of the metadata themselves as Big Data seems reasonable. Additionally, by making the metadata more fit-for-use for text mining, natural language processing, ingestion into machine learning or search engines, OmicIDX aims to facilitate augmentation and analysis of these metadata.

Last updated on Oct 5, 2019 4 min read

Using directory-local variables to customize the emacs project experience

I use emacs for nearly all my editing and interactive analysis. As one typically does, more than one project is the norm, not the exception. Discovering projectile for project-specific buffers and controls, combined with helm for very fast, fuzzy completions, makes emacs a very convenient and efficient environment for most task. One challenge I ran into was the need to have multiple interactive python buffers, typically one per project. However, the out-of-the-box behavior of python-mode is to have only one python interactive buffer named “Python”.

Last updated on Jun 4, 2020 2 min read Notes

Publications

GenomicSuperSignature: interpretation of RNA-seq experiments through robust, efficient comparison to public databases

Millions of transcriptomic profiles have been deposited in public archives, yet remain underused for the interpretation of new …

Sehyun Oh, Ludwig Geistlinger, Marcel Ramos, Jaclyn N. Taroni, Vincent Carey, Casey Greene, Levi Waldron, Sean Davis

PDF DOI

HGNChelper: identification and correction of invalid gene symbols for human and mouse

Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and …

Sehyun Oh, Jasmine Abdelnabi, Ragheed Al-Dulaimi, Ayush Aggarwal, Marcel Ramos, Sean Davis, Markus Riester, Levi Waldron

PDF DOI

Toward a gold standard for benchmarking gene set enrichment analysis

Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of …

Ludwig Geistlinger, Gergely Csaba, Mara Santarelli, Marcel Ramos, Lucas Schiffer, Nitesh Turaga, Charity Law, Sean Davis, Vincent Carey, Martin Morgan, Ralf Zimmer, Levi Waldron

PDF DOI

BiocPkgTools: Toolkit for mining the Bioconductor package ecosystem

Motivation: The Bioconductor project, a large collection of open source software for the comprehension of large-scale biological data, …

Shian Su, Vincent J Carey, Lori Shepherd, Matthew Ritchie, Martin T Morgan, Sean Davis

PDF DOI

See all publications

Recent & Upcoming Talks

Orchestra: A cloud platform for hosting hands-on computational workshop environments

Orchestra is a cloud platform for hosting hands-on computational workshop environments. In this talk, I review the detailed use case of Bioconductor Workshops and then proceed to a shallow dive into Kubernetes infrastructure that powers Orchestra.

Dec 2, 2021 8:00 AM — 8:30 AM Online

Slides

Orchestra: A cloud platform for hosting hands-on computational workshop environments

Bioinformatics, HPC and AI

These are just introductor slides for a panel discussion at the Supercomputer21 conference.

Nov 17, 2021 2:30 PM — 4:00 PM St. Louis, Missouri, USA and Remote

Slides

GenomicSuperSignature: Interpretation of RNA-Seq Experiments through Robust, Efficient Comparison to Public Databases

Millions of transcriptomic profiles have been deposited in public archives, yet remain underused for the interpretation of new experiments. Existing methods for leveraging these public resources have focused on the reanalysis of existing data or analysis of new datasets independently. We present a novel approach to interpreting new transcriptomic datasets by near-instantaneous comparison to public archives without high-performance computing requirements. All necessary data and functions to apply our approach to existing or new data are included in our software available as part of the Bioconductor project.

Nov 14, 2021 4:30 PM — 6:00 PM Supercomputer Conference

Code Slides

GenomicSuperSignature: Interpretation of RNA-Seq Experiments through Robust, Efficient Comparison to Public Databases

Bioconductor: Increasing the Value of Public Data with Software and Data Engineering

In this talk, I provide a high-level overview of the Bioconductor and then give some examples of tooling that connects Bioconductor to …

Oct 26, 2021 12:00 PM Aurora, CO, USA

Sean Davis

Slides

Bioconductor: Increasing the Value of Public Data with Software and Data Engineering

Some quick thoughts on training, education, workforce development, and community

Oct 11, 2021 3:00 PM — 3:30 PM University of Colorado Anschutz Medical Campus

Slides

Some quick thoughts on training, education, workforce development, and community

Contact

Research Complex 1 South, Room 8115, Aurora, CO 80230
DM Me