Jeanette Prinz, PhD - Team Lead (Life Sciences), KNIME GmbH
Temesgen H. Dadi, PhD - Technical Data Scientist (Life Sciences), KNIME GmbH
The availability of massive amounts of sequence data enables powerful bioinformatics applications in machine learning, particularly in deep learning such as prediction of Motifs, protein binding sites and secondary structures. These successful data science projects do not only involve creating a model, but also gathering, wrangling, and visualizing the data as well as deploying and consuming the resulting models. The open source KNIME Analytics Platform available from https://www.knime.com/downloads offers an accessible tool based on the visual programming paradigm to accomplish all of this. Within KNIME, one can easily create reproducible workflows by choosing from a wide array of data transformations, machine learning algorithms, and visualizations.
In this hands-on tutorial, participants will produce a workflow involving these different stages using the concrete example of predicting DNA binding sites. We will start by importing and cleaning up the input data which consists of short DNA sequencing reads in FASTA format. The DNA seq ences are then converted into a numerical matrix using a one-hot-encoding. With the help of the KNIME Keras integration, we will then create a deep learning model that combines convolutional and recurrent neural networks. Finally, we will evaluate our trained model on a separate test dataset, visualize our results, and deploy the final model.
Participants will learn how to:
Students (grad/undergrad), researchers, principal investigators with an interest in machine learning, bioinformatics, data manipulation are welcome to attend the tutorial. No coding knowledge is needed. A little background on machine learning and sequencing data is a plus. We will provide a short introduction to the KNIME Analytics Platform, protein binding sites and deep learning, before starting the hands-on sessions.
For hands-on tutorial, participants need to bring their own laptop. All the necessary software and data will be made available for download before the tutorial day.