Kernel Methods for Fusing Heterogeneous Data
Course by Gunnar Rätsch at Bio-IT 2010 in Hannover, October 4, 2010.
This presentation is part of the CHI Molecular Diagnostics pre-conference course Introduction to Biomedical Data Fusion.
Kernel methods, in particular support vector machines, have established themselves as a very powerful and versatile paradigm for learning from high-dimensional data. Kernels have been developed not only to deal with numerical data but also sequence information or even graphs representing e.g. protein-protein interaction data. Their widespread use for developing molecular signatures as well as the large number and diversity of bioinformatics applications testify the power of this approach.
Adding to that the ability to combine various kernels irrespective of their underlying data type and to learn optimal combinations from the data itself provides therefore a unique tool for achieving optimal prediction performance and data understanding through data fusion.
This course will give a brief introduction to kernel methods, an overview over the various types of kernels relevant to biological data and discuss the use of kernel combination for data fusion. It will also present a corresponding machine learning toolbox which Dr. Rätsch's group has developed for unified large scale learning from a broad range of data including also the fusion of data from very diverse sources.
The course will be structured as follows:
- Introduction to Support Vector Machines (SVMs)
- The kernel concept
- Kernels for non-vectorial Data
- Integration of heterogeneous data
- Illustrative examples
- Shogun Software
For the tutorial paper we have developed a Galaxy-based web service and a toolbox (easySVM) that can be easily used for most of the problems considered in the tutorial. Examples for using the software can be found here.
- What is a SVM? by William S. Noble (published in Nature Biotechnology, Volume 24, Number 12, December 2006)
- Kernel methods in genomics and computational biology by Jean-Philippe Vert (in Camps-Valls, G., Rojo-Alvarez, J.-L. and Martinez-Ramon, M. (Eds.), Kernel Methods in Bioengineering, Signal and Image Processing, p.42-63, Idea Group, 2007)
- Support Vector Machines and Kernels for Computational Biology by Asa Ben-Hur, Cheng Soon Ong, Sören Sonnenburg, Bernhard Schölkopf, and Gunnar Rätsch (in PLoS Comput Biol 4(10): e1000173)
I gratefully acknowledge help from Sören Sonnenburg and Cheng Soon Ong for preparing an earlier version of this tutorial. Moreover, slides were contributed by Peter Gehler, Karsten Borgwardt and Petra Philips.
In case of comments, problems, questions etc. feel free to contact Dr. Gunnar Rätsch.
Dr. Rätsch is heading the research group for Machine Learning in Biology at the Friedrich Miescher Laboratory of the Max-Planck Society. His earlier works on boosting and support vector machines lead to his current interest of applying machine learning to real world problems from computational biology. Besides their works on using kernel methods for data fusion, his group focuses e.g. on novel analysis methods for next generation sequencing data, the prediction of MHC binding, ab initio gene finding in nematode genomes and the prediction and validation of transcriptional regulation (e.g. alternative splicing).