GitHub actions allow flexible and potentially complicated `actions` that comprise `workflows` that respond to events on Github. Continuous integration, messaging Slack, greeting new contributors, deploying applications, and many other templates are ready for customization and integration into any repo.
General notes about using containers
In this post, I will quickly build a docker image containing the sra-toolkit and a key for dbGaP downloads. Because the key file is private, I will be using the secure Google Container Registry to store the image for later use in genomics workflows.
Background Container technologies like docker enable quick and easy encapsulation of software, dependencies, and operating systems. One or more containers can render entire software ecosystems portable, enhance reproducibility and reusability, and facilitate sharing of software, tools, and even infrastructure.
One of the main features of the annual Bioconductor Conference is the proportion of time spent working with code in the form of workshops. To support these workshops, we ask workshop presenters to supply Rmarkdown materials which we collate into workshop materials. Using literate programming approaches like Rmarkdown ensures that the workflows are self-consistent and work as expected.
In addition to the Rmarkdown workshop materials, we also need a consistent computing environment that can support reasonably large computation, provide high-performance network and file system access, and is essentially unlimited in scale (we expect to have >150 participants, each with his/her own machine).
Apache Spark in a few words Apache Spark is a software and data science platform that is purpose-built for large- to massive-scale data processing. Spark supports processing of data in batch mode (run as a pipeline) or in interactive mode using command-line programming style or in popular notebook style of coding. While scala is the native language for Spark, language bindings exist for python, R, and Java as well.