German Conference on Bioinformatics (GCB) 2020

14 - 17 September 2020,
Virtual Conference

GCB 2020-Logo

WS2: Binding Site Prediction using KNIME Analytics Platform and its Keras Deep Learning Integration


Jeanette Prinz, PhD - Team Lead (Life Sciences), KNIME GmbH
Temesgen H. Dadi, PhD - Technical Data Scientist (Life Sciences), KNIME GmbH


The availability of massive amounts of sequence data enables powerful bioinformatics applications in machine learning, particularly in deep learning such as prediction of Motifs, protein binding sites and secondary structures. These successful data science projects do not only involve creating a model, but also gathering, wrangling, and visualizing the data as well as deploying and consuming the resulting models. The open source KNIME Analytics Platform available from offers an accessible tool based on the visual programming paradigm to accomplish all of this. Within KNIME, one can easily create reproducible workflows by choosing from a wide array of data transformations, machine learning algorithms, and visualizations.

In this hands-on tutorial, participants will produce a workflow involving these different stages using the concrete example of predicting DNA binding sites. We will start by importing and cleaning up the input data which consists of short DNA sequencing reads in FASTA format. The DNA seq ences are then converted into a numerical matrix using a one-hot-encoding. With the help of the KNIME Keras integration, we will then create a deep learning model that combines convolutional and recurrent neural networks. Finally, we will evaluate our trained model on a separate test dataset, visualize our results, and deploy the final model.

Learning Objectives:

Participants will learn how to:

  • Use the open source KNIME Analytics Platform for importing, blending and transforming data from different sources
  • Configure, create, train and evaluate deep learning neural networks modes using the
    Keras machine learning framework in KNIME Analytics Platform without the need to
    write code
  • Evaluate and deploy the resulting models

Intended audience and level - Beginner

Students (grad/undergrad), researchers, principal investigators with an interest in machine learning, bioinformatics, data manipulation are welcome to attend the tutorial. No coding knowledge is needed. A little background on machine learning and sequencing data is a plus. We will provide a short introduction to the KNIME Analytics Platform, protein binding sites and deep learning, before starting the hands-on sessions.


For hands-on tutorial, participants need to bring their own laptop. All the necessary software and data will be made available for download before the tutorial day.



Supported by