German Conference on Bioinformatics (GCB) 2020

14 - 17 September 2020,
Virtual Conference

GCB 2020-Logo

WS1: Modern epigenomic analysis: theory and practice

Instructors and helpers:

Prof. Dr. Marcel Schulz, Leader of Computational Epigenomics & Systems Cardiology
group, Institute of Cardiovascular Regeneration, Uniklinikum and Goethe University

Sivarajan Karunanithi, PhD-Student, Computational Epigenomics & Systems
Cardiology, Institute of Cardiovascular Regeneration, Uniklinikum and Goethe
University Frankfurt

Nina Baumgarten, PhD-Student, Computational Epigenomics & Systems Cardiology,
Institute of Cardiovascular Regeneration, Uniklinikum and Goethe University Frankfurt

Dennis Hecker, PhD-Student, Computational Epigenomics & Systems Cardiology,
Institute of Cardiovascular Regeneration, Uniklinikum and Goethe University Frankfurt

Zhijian Li, Ph.D-Student, Institute for Computational Genomics, RWTH Aachen


Decades of ongoing research has improved our understanding of gene regulation. An open
challenge in epigenomics is to unravel the role of non-coding regions in transcriptional
regulation of possibly far-away target genes. Genome-wide association studies show that a
large part of genomic variation is found in those non-coding regulatory elements, but their
possible mechanisms of gene regulation are often unknown.

Due to the constant developments of whole-genome assays that measure different parts of
the epigenome, of many or even single cells, computational method development is a moving target. Thus, it can remain difficult to perform epigenome analysis and integrate that with other types of information, such as enhancer-gene interactions and gene expression data. In this tutorial, we will review modern technologies in epigenomics and discuss state-of-the-art methods for the analysis of the resulting data. We will concentrate on the analysis of ATACseq at both bulk and single cell level to define regulatory elements (REMs). We will discuss standard and advanced computational tasks including quality analysis, peak calling,
footprinting, motif analysis and clustering of single cells. The attendees will perform analysis
of real datasets using workflows set-up for the tutorial.

In addition to the challenges in REM annotation, linking a REM to the gene it regulates is an
even more difficult task. Possible approaches are linking a gene to its nearest REM, assigning
all REMs to a gene that are located in a defined window around the gene or determining
interactions based on associations between epigenomics and expression data. Each method
has its advantages and drawbacks. For instance, nearest gene approaches are not sufficient
to incorporate the REMs that were shown to target far-away genes. On top of these
challenges, every method performs differently depending on the data at hand and the
characteristics of the region of interest.


As epigenomics is a rapidly evolving field with a lot of emerging new techniques, it is hard to
keep track of all data sources and analysis tools available and to still be aware of potential
flaws that come along with them. Without a proper overview, researchers might choose a tool that does not fit their data or might not realize how a certain tool can affect the interpretability of their results. Moreover, with all the different annotations of regulatory elements out there, it is important to raise awareness of the underlying methods used. Especially when trying to make use of a novel database that catalogues experimentally measured or computationally inferred REM-gene associations, there are tools that select the nearest gene as REM target, although it has been experimentally confirmed that a REM can be located several kilobases away from its associated gene. We want to provide a guide to navigate through the complexity of epigenomics data, help to obtain a deeper understanding of available tools and to get practice in performing appropriate analyses.

Goals & Audience:

This intermediary level tutorial is designed for bioinformaticians who are interested in studying regulatory regions of the genome, who want to gain insight into the current status of the field and to practice possible workflows. First, we want to provide an overview of the current status of epigenomics, the state-of-the-art techniques and the respective data types. Subsequently, we want to show how to analyze epigenomics bulk and single-cell data with a focus on open chromatin sequencing (ATAC-seq). The attendees will learn how to predict and compare TF binding in regions that were defined from ATAC-seq data.

In the adjacent hands-on session, the attendees will perform a REM annotation analysis on a bulk and a single cell ATAC-seq data set, followed by a transcription factor analysis in the
identified regions. The next section will discuss the possible approaches to identify target
genes of REMs. A concluding hands-on session will give the opportunity to try out different
approaches for determining REM-gene interactions and to get to know their characteristics
and potential drawbacks.


Participants should bring their own wifi-enabled laptop to be able to practice the presented
workflow. All the software used in the tutorial will come as packaged workflows in containerized format (e.g. Docker) to ease installation and use during the hands-on sessions.
All the necessary information will be hosted on a github website, containing all slides of the presentations, the use cases and materials for the hands-on session which will be circulated to registered participants for preparation.



Supported by