I will come back to the rationale for this post at a later date, but consider this another notebook type of post, serving as part of my external brain. Also, this post assumes that the reader has some passing understanding of kubernetes, container technologies such as Docker, and is interested in building or running distributed systems such as web applications, compute jobs, or data services.
Files for this post are available at github.
In this post, I will quickly build a docker image containing the sra-toolkit and a key for dbGaP downloads. Because the key file is private, I will be using the secure Google Container Registry to store the image for later use in genomics workflows.
Background Container technologies like docker enable quick and easy encapsulation of software, dependencies, and operating systems. One or more containers can render entire software ecosystems portable, enhance reproducibility and reusability, and facilitate sharing of software, tools, and even infrastructure.
One of the main features of the annual Bioconductor Conference is the proportion of time spent working with code in the form of workshops. To support these workshops, we ask workshop presenters to supply Rmarkdown materials which we collate into workshop materials. Using literate programming approaches like Rmarkdown ensures that the workflows are self-consistent and work as expected.
In addition to the Rmarkdown workshop materials, we also need a consistent computing environment that can support reasonably large computation, provide high-performance network and file system access, and is essentially unlimited in scale (we expect to have >150 participants, each with his/her own machine).