Machine Learning in Bioinformatics
Course on "Machine Learning in Bioinformatics" at the Machine Learning Summer School.
Machine Learning in Bioinformatics
Abstract:
I will start by giving a general introduction into Bioinformatics, including basic biology, typical data types (sequences, structures, expression data and networks) and established analysis tasks. In the second part, I will discuss the problem of predictive sequence analysis with Support Vector Machines (SVMs). I will introduce a series of kernels suitable for different analysis tasks. Furthermore I will discuss the basic data structures needed for large scale learning and how to combine kernels for heterogeneous data. In the third part, I will focus on Hidden Markov models and discriminative alternatives like Conditional Random Fields and Hidden Markov SVMs suitable for segmentation tasks frequently appearing in Bioinformatics. In the last part I will present three applications in greater detail: A large margin alignment algorithm, computational gene finding and the identification of polymorphisms from resequencing arrays.
Overview:
- Introduction to Bioinformatics (45 min)
- Basic Biology and Central Dogma
- Typical Data Types
- Common Analysis Tasks
- Sequence Analysis with SVMs (105 min)
- String Kernels
- Large Scale Data Structures
- Heterogeneous Data
- Structured Output Learning (30 min)
- Hidden Markov Models
- Dynamic Programming
- Discriminative Approaches
- Some Applications (90 min)
- Spliced Alignments
- Gene Finding
- Analysis of Resequencing Arrays

