German Conference on Bioinformatics (GCB) 2020

14 - 17 September 2020,
Virtual Conference
 

GCB 2020-Logo

WS3: Tutorial: Reproducibility with Bioconda and Snakemake

Instructors:

Johannes Köster, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany

Marcel Bargull, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany

Abstract:

The typical data analyst must simultaneously juggle multiple projects, each having its own duration and software requirements. As few analysts have any formal training on structuring or even writing the code necessary to perform an analysis, it is unsurprising that the iterative analytic process can produce a wide assortment of almost identically named files (e.g., “final_results.txt”, “final_results.version2.txt”, “final_results.really_final.txt”), all with unclear origins and produced with a hodgepodge of similarly poorly named scripts. The near impossibility of tracing a results file to the exact process that produced it creates untold difficulties both when it comes time to publish results as well as when planning subsequent experiments months or years later (afterall, which of the “final_results” files was really the “right one”?). These issues are further compounded by software paths and other similar assumptions being hard-coded into scripts, preventing easy analysis replication elsewhere. Performing analyses in a reproducible and traceable manner is clearly needed to combat such problems.

In this hands-on tutorial, we demonstrate how Conda can be used to deploy specific software versions easily, reproducibly, and without administrator credentials. Moreover, we demonstrate how Conda’s ability to create isolated software environments helps to avoid side-effects between different analyses or different steps of the same analysis. Attendees will also learn how to create conda recipes themselves, so they can contribute new packages to projects such as Bioconda. We further demonstrate how Snakemake can be used in combination with Conda and Containers to create reproducible analyses workflows and executed them on any platform from workstations to clusters and the cloud.

With over 6 million downloads, Bioconda is the leading platform for sustainable distribution of bioinformatics software. With on average over 3 new citations per week, Snakemake is one of the most widely used scientific workflow management systems.

Learning Objectives:

  • Creating and using Conda environments
  • Creating Conda recipes
  • Creating and running Snakemake workflows

Audience:

Beginners, Intermediates, Core-Facility Staff
Audience should have basic familiarity with Python, Git, command line.

Requirements:

  • Laptops with Linux or macOS
  • Pre-installed Miniconda - install via: https://conda.io/miniconda.html
DECHEMA e.V.

 

Supported by