class: center, middle, inverse, title-slide # Reproducible environments for integrated computational workflows ## Bioconductor 2020 - Birds of a Feather (BOF) ###
Kevin Rue-Albrecht
and
Charlotte George
### University of Oxford ### 2020-07-31 (updated: 2020-07-31) --- layout: true <div class="my-header"><img src="img/BioC2020.png" alt="Bioconductor logo" align="right" height="90%"></div> <div class="my-footer"><span> Bioconductor 2020      Reproducible environments for integrated computational workflows </span></div> --- # Structure of the session .pull-left[ ## What <i class="far fa-clock"></i> This session is 55 minutes. <i class="fas fa-info-circle"></i> A handful of introduction slides. + Workflow managers + Environment managers <i class="fas fa-volume-up"></i> Open discussion + Your preferences + Opinionated best practices + Feature requests ] .pull-right[ ## How <i class="far fa-comment-alt"></i> [sli.do/mrqa5tmx](http://sli.do/mrqa5tmx) <i class="fas fa-microphone"></i> Take the mic! <i class="fas fa-chart-bar"></i> Polls (on the right) <i class="fab fa-google-drive"></i> Take notes on this collaborative [Google Docs](https://docs.google.com/document/d/1ok2H9HVoa8c-m-VcwdQM0ccOuMTmESKCzaaWHb6trwE/edit?usp=sharing) during (and after) the session! <i class="fab fa-slack"></i> Continue the discussion on the [#reproducibility](https://community-bioc.slack.com/archives/C0185MUQ8TA) channel in the [Bioconductor](community-bioc.slack.com) Slack workspace! <i class="fab fa-github"></i> Contribute to the [gitbook](https://github.com/kevinrue/ORCA)! ] --- # Reproducible workflows and environments ## Why do we care? > The [bio.tools](https://bio.tools/) registry of life science software tools includes over 17,000 entries from more than 1,000 contributors <a name=cite-mark2020pipelines></a>([Marx, 2020](https://www.ncbi.nlm.nih.gov/pubmed/32601427)). - We all write workflows that combine many of those tools to process and analyze data. + e.g., ChIP-seq, scRNA-seq, RNA-seq, proteomics + Some are published, others are "in-house" - For full reproducibility, collaboration and teaching relies on the capacity to share: + Code + Environment(s): e.g., software dependencies, versions + Input data: e.g., raw data, annotations --- # Workflow managers .pull-left[ <a href="https://www.nextflow.io/"><img src="https://www.nextflow.io/img/nextflow2014_no-bg.png", height="30px", align="middle"></a> <a href="https://nf-co.re/"><img src="https://nf-co.re/assets/img/logo/nf-core-logo.svg", height="40px" align="middle"></a> <a href="https://snakemake.readthedocs.io/en/stable/#"><img src="https://learning.cyverse.org/projects/container_camp_workshop_2019/en/latest/_images/snakemake.png", height="40px" align="middle"></a> <a href="http://www.ruffus.org.uk/"><img src="http://www.ruffus.org.uk/_images/logo.jpg", height="50px" align="middle"></a> <a href="https://github.com/cgat-developers"><img src="https://avatars3.githubusercontent.com/u/5339854?s=200&v=4", height="50px" align="middle"></a> <a href="https://github.com/cgat-developers/cgat-core"><img src="https://github.com/cgat-developers/cgat-core/raw/master/docs/img/CGAT_logo.png", height="50px" align="middle"></a> <a href="https://bcbio-nextgen.readthedocs.io/en/latest/"><img src="https://bcbio-nextgen.readthedocs.io/en/latest/_images/banner.png", height="50px", align="middle"></a> <a href="https://workflowhub.eu/"><img src="https://workflowhub.eu/assets/logos/workflowhub-20f8594d47fa27784d6a3c24010cf728e6b47753448dfa0cb6c7b7cc055c08bf.svg", height="50px", align="middle"></a> <a href="https://bioconductor.org/packages/systemPipeR/"><img src="https://github.com/Bioconductor/BiocStickers/blob/master/systemPipeR/systemPipeR.png?raw=true", height="80px" align="middle"></a> <a href="https://bioconductor.org/packages/scPipe/"><img src="https://github.com/Bioconductor/BiocStickers/blob/master/scPipe/scPipe.png?raw=true", height="80px" align="middle"></a> <a href="https://docs.ropensci.org/drake/"><img src="https://docs.ropensci.org/drake/reference/figures/logo.svg", height="80px" align="middle"></a> ## No logo <i class="far fa-sad-tear"></i> ... *[Rcwl](https://bioconductor.org/packages/3.11/Rcwl)* ] .pull-right[ ## Features <i class="far fa-thumbs-up"></i> Transform, merge, split, ... files <i class="far fa-thumbs-up"></i> Checkpoints, logs, reports, visualization <i class="far fa-thumbs-up"></i> Modular, flexible, partial runs <i class="far fa-thumbs-up"></i> HPC queue managers and [DRMAA](http://www.drmaa.org/) <i class="far fa-thumbs-up"></i> Parallelisation <i class="far fa-thumbs-up"></i> Portable, containerized, versioned <i class="far fa-thumbs-up"></i> Different environments for some steps <i class="far fa-thumbs-up"></i> Integration with <i class="fab fa-r-project"></i>, <i class="fab fa-python"></i>, <i class="fab fa-java"></i>, ... ] .center[ What is _your_ favorite? ] --- # Environment managers `$$Same\ workflow\ \times\ Different\ environment\ =\ Different\ result$$` <br/> <br/> <img src="https://i.imgur.com/k5B1CBd.jpg" height="300px" style="display: block; margin: auto;" /> --- # Environment managers .pull-left[ <a href="https://docs.conda.io/en/latest/"><img src="https://docs.conda.io/en/latest/_images/conda_logo.svg", height="30px", align="middle"></a> <a href="https://github.com/TheSnakePit/mamba"><img src="https://miro.medium.com/max/500/1*akFPXZ_b27Bz38cnQHO3yw.png", height="40px", align="middle"></a> <a href="https://www.docker.com/"><img src="https://bit2bitamericas.com/wp-content/uploads/2020/01/logo-docker.png", height="30px", align="middle"></a> <a href="https://sylabs.io/docs/"><img src="https://sylabs.io/assets/images/logos/singularity.png", height="50px", align="middle"></a> <a href="https://rstudio.github.io/renv/"><img src="https://rstudio.github.io/renv/reference/figures/logo.svg", height="80px", align="middle"></a> ## No logo <i class="far fa-sad-tear"></i> ... *[BiocManager](https://bioconductor.org/packages/3.11/BiocManager)* ] .pull-right[ ## Features <i class="far fa-thumbs-up"></i> Track software versions <i class="far fa-thumbs-up"></i> Track source software repositories <i class="far fa-thumbs-up"></i> Portable, versioned <i class="far fa-thumbs-up"></i> Check compatibility with other packages <i class="far fa-thumbs-up"></i> Support multiple programming languages <i class="far fa-thumbs-up"></i> Track / Manage system dependencies <i class="far fa-thumbs-up"></i> Upgrade and freeze <i class="far fa-thumbs-up"></i> Lightweight, fast (e.g., cache) ] .center[ What is _your_ favorite? ] --- # Use case: Snakemake .pull-left[ ```python rule bwa: input: "genome.fa" "reads.fq" output: "mapped.bam" singularity: "docker://somecontainer:v1.0" conda: "envs/bwa.yaml" envmodules: "bio/bwa/0.7.9", "bio/samtools/1.9" shell: "bwa mem {input} |" "samtools view -Sbh - > {output}" ``` ] .pull-right[ ## Usage ```bash snakemake --use-conda ``` ```bash snakemake --use-singularity ``` ```bash snakemake --use-conda --use-singularity ``` ```bash snakemake --use-envmodules ``` ```bash snakemake --use-singularity --kubernetes ``` ] --- # Paradox of choice - What do you choose? - How do you choose? + Keep sight of your objective: medical research? teaching? method development? + "Best" practice may depend on your environment and goals + What works for you? What can your sysadmin support? - The community may benefit from a resource (e.g., review, book) based on experience to provide guidance and promote adoption of workflow/environment managers adapted to various context. - Share or link (minimal) working examples for others to use as template --- # Discussion points .pull-left[ ## Past & Present <i class="fas fa-question"></i> Workflow managers <i class="fas fa-question"></i> Environment managers <i class="fas fa-question"></i> Challenges <i class="fas fa-question"></i> Opinionated best practices ] .pull-right[ ## Future <i class="fas fa-question"></i> Sharing templates and community efforts <i class="fas fa-question"></i> Feature requests <i class="fas fa-question"></i> Bioconductor interests and efforts ] .youreup[ .center[ You're up ! ] ] - message in the <i class="far fa-comment-alt"></i> Text chat - take the <i class="fas fa-microphone"></i> mic - make a <i class="fas fa-chart-bar"></i> poll. - edit the <i class="fab fa-google-drive"></i> [Google Docs](https://docs.google.com/document/d/1ok2H9HVoa8c-m-VcwdQM0ccOuMTmESKCzaaWHb6trwE/edit?usp=sharing) - join the [#reproducibility](https://community-bioc.slack.com/archives/C0185MUQ8TA) <i class="fab fa-slack"></i> Slack channel - <i class="fab fa-github"></i> Contribute to the [gitbook](https://github.com/kevinrue/ORCA) --- # References .small-p[ <a name=bib-mark2020pipelines></a>[Marx, V.](#cite-mark2020pipelines) (2020). "When computational pipelines go 'clank'". In: _Nat Methods_ 17.7, pp. 659-662. ISSN: 1548-7105 (Electronic) 1548-7091 (Linking). DOI: [10.1038/s41592-020-0886-9](https://doi.org/10.1038%2Fs41592-020-0886-9). URL: [https://www.ncbi.nlm.nih.gov/pubmed/32601427](https://www.ncbi.nlm.nih.gov/pubmed/32601427). ] .thanks[ .center[ Thank you! ] ] <img src="https://raw.githubusercontent.com/Bioconductor/BiocStickers/f792bce2617124b9e6c4fde55b04909125f8db3f/boards/CAB/CAB_logo_drawing.png" height="400px" style="display: block; margin: auto;" />