German Conference on Bioinformatics (GCB) 2020

14 - 17 September 2020,
Virtual Conference

GCB 2020-Logo

WS1: Modern epigenomic analysis: theory and practice

This website contains material for an epigenome analysis tutorial that covers ATAC-seq analysis and integration with TF motifs and gene expression.

Instructors and helpers:

Prof. Dr. Marcel Schulz, Leader of Computational Epigenomics & Systems Cardiology
group, Institute of Cardiovascular Regeneration, Uniklinikum and Goethe University

Sivarajan Karunanithi, PhD-Student, Computational Epigenomics & Systems
Cardiology, Institute of Cardiovascular Regeneration, Uniklinikum and Goethe
University Frankfurt

Nina Baumgarten, PhD-Student, Computational Epigenomics & Systems Cardiology,
Institute of Cardiovascular Regeneration, Uniklinikum and Goethe University Frankfurt

Dennis Hecker, PhD-Student, Computational Epigenomics & Systems Cardiology,
Institute of Cardiovascular Regeneration, Uniklinikum and Goethe University Frankfurt

Zhijian Li, Ph.D-Student, Institute for Computational Genomics, RWTH Aachen


Decades of ongoing research has improved our understanding of gene regulation. An open
challenge in epigenomics is to unravel the role of non-coding regions in transcriptional
regulation of possibly far-away target genes. Genome-wide association studies show that a
large part of genomic variation is found in those non-coding regulatory elements, but their
possible mechanisms of gene regulation are often unknown.

Due to the constant developments of whole-genome assays that measure different parts of
the epigenome, of many or even single cells, computational method development is a moving target. Thus, it can remain difficult to perform epigenome analysis and integrate that with other types of information, such as enhancer-gene interactions and gene expression data. In this tutorial, we will review modern technologies in epigenomics and discuss state-of-the-art methods for the analysis of the resulting data. We will concentrate on the analysis of ATACseq at both bulk and single cell level to define regulatory elements (REMs). We will discuss standard and advanced computational tasks including quality analysis, peak calling,
footprinting, motif analysis and clustering of single cells. The attendees will perform analysis
of real datasets using workflows set-up for the tutorial.

In addition to the challenges in REM annotation, linking a REM to the gene it regulates is an
even more difficult task. Possible approaches are linking a gene to its nearest REM, assigning
all REMs to a gene that are located in a defined window around the gene or determining
interactions based on associations between epigenomics and expression data. Each method
has its advantages and drawbacks. For instance, nearest gene approaches are not sufficient
to incorporate the REMs that were shown to target far-away genes. On top of these
challenges, every method performs differently depending on the data at hand and the
characteristics of the region of interest.


One of the main molecular mechanisms controlling the temporal and spatial expression of genes is transcriptional regulation. In this process, transcription factors (TFs) bind to the promoter and enhancers in the vicinity of a gene to recruit (or block) the transcriptional machinery and start gene expression. Inference of gene regulatory networks, i.e. factors controlling the expression of a particular gene, is a key challenge when studying development and disease progression. The availability of different experimental assays (Histone ChIP-seq, Dnase1-seq, ATAC-seq, NOME-seq etc.) that allow to map in-vivo chromatin dynamics and gene expression (RNA-seq), has triggered the development of novel computational modelling approaches for accurate prediction of TF binding and activity by integrating these diverse epigenomic datasets. However, in practice, researchers are faced with the problems that come with handling diverse assays, understanding the tools involved and building specific workflows that are tailored to the data they have.

Goals & Audience:

This tutorial is targeted to an audience of bioinformaticians with previous experience in gene expression and next generation sequencing analysis. This intermediary level tutorial will provide you knowledge on the use of state-of-art tools for inference of gene regulatory networks from chromatin and expression data. First, we will review tools to conduct the following analyses: 1) predict regulatory regions from ATAC-seq data, using footprint methods (HINT - Li et al., 2019) and show how to determine cell-specific TF binding in these regions and 2) study how to associate regulatory regions to genes and how to integrate gene expression data (e.g. Schmidt et al. 2016, Durek et al. 2016). After introductory presentations we will guide participants through hands on practicals on real data for both parts.


We have created a ReadTheDocs documentation for the participants to setup the required software for the tutorial and for publishing the final working examples, accessible here.


A link to the schedule of the sessions can be found here:




Supported by