- Info
Accurate Splice Site Detection in C. elegans (Supplementary Material)
Supplementary Material to the paper "Accurate Splice Site Detection in C. elegans" by Gunnar Rätsch and Sören Sonnenburg.
Download paper: pdf gz
|
appeared in Kernel Methods in Computational Biology B. Schölkopf, K. Tsuda and J.-P. Vert Editors, MIT press link
|
|
This page contains additional material to the above mentioned paper. We tried to document exactly
- which data sets where used,
- what the model selection results were and
- provide an implementation of the Weighted Degree Kernel.
In Section 1 we provide the virtual gene list from which
acceptor and donor sites have been derived. This data
can be found in Section 2. Model selection results for
Splice Site Recognition is provided in Section 3 while
Section 4 provides the data to evaluate complete Splice
Forms for that model selection results can be found in
Section 5. The Weighted Degree Kernel Implementation is
found in Section 6.
|
-
These genes were used to generate the splice data set and to perform the comparison with genscan.
The files contain gene strings in one line, followed
by two lines of
gene_start intron_end+1 intron_end+1
intron_start+1 intron_start+1 gene_end+2
|
i.e. gene_start is on atg, intron_start on
gt, intron end on agx and gene end on
tagxx.
so the data looks like this:
tccgaatatcaatgtga...
571 738 1287 2018
683 939 1449 2144
tccgaatatcaatgtg...
571 695 868
648 818 1031
...
|
Download:
-
The data looks like this
-1 TTCTGAAGAAGACGATGACGAAGACGAAGGAGAAGCCGTTGCAGAACTTGTCACAAAGTG
-1 CCAACCTAATCGTTATACATATGTATTTACAGTCGCAAATGACAATTGAACAAATAAATG
....
+1 AATGTTTCAATTATAAAAATTGTTAATTACAGGGGGACACCTGTATCAGTGTGACATTTC
....
|
whereas the number -1 means no splice site while +1 means splice site. Then after a space the sequence follows.
Download:
-
(selected for largest validation ROC)
All files result files names *.{tst|dat} contain a
line about the actual validation or test error
followed by the actual classifier output.
validation error = 0.014181
-12.143139
-10.286769
...
|
Readily trained SVMs are saved in the following
format:
b=-3.577909
alphas=[
2 -1.000000
13 +0.373805
57 +1.000000
68 -0.332549
85 -1.000000
...
]
|
Here b is the bias term and alphas contain pairs of index and value, where
index is the index to a nonzero support vector and value the product of the
lagrange multiplier and label of that support
vector.
Results:
Positional Weight Matrixes
| pseudo_p | pseudo_n | order | RSE | Err |
| acceptor | 1 | 1 | 2 | 98.88 | 1.54 |
| donor | 10 | 1e-4 | 2 | 98.23 | 1.85 |
Download result files:
Weighted Degree Kernel
| C | degree | RSE | Err |
| acceptor | 1 | 4 | 99.06 | 1.42 |
| donor | 1 | 3 | 98.47 | 1.78 |
Download result files:
-
Locality Improved Kernel
| C | degree | width | RSE | Err |
| acceptor | 0.75 | 4 | 15 | 99.08 | 1.44 |
| donor | 1 | 3 | 10 | 98.48 | 1.80 |
Download result files:
-
TOP-Linear Kernel
| C | degree | RSE | Err |
| acceptor | 0.5 | 3 | 98.88 | 1.52 |
| donor | 0.5 | 2 | 98.35 | 1.82 |
Download result files:
SVM-Pairwise with 500 reference examples (trained on 20k), only first 10k test
| C | gapcost | RSE | Err |
| acceptor | 5 | 0.5 | 98.01 | 1.93 |
| donor | 50 | 0.5 | 97.60 | 2.03 |
Download result files:
Polynomial Kernel
| C | degree | RSE | Err |
| acceptor | 2 | 6 | 98.94 | 1.80 |
| donor | 2 | 5 | 98.31 | 2.08 |
Download result files:
-
-
Positional Weight Matrixes
| sigmoid_a | 0.45 |
| sigmoid_b | -0.9 |
| alpha | -3.75 |
used model parameters (may differ from above)
| order | pseudo_p | pseudo_n |
| acceptor | 3 | 1 | 1e-6 |
| donor | 3 | 10 | 100 |
Weighted Degree Kernel
| sigmoid_a | 0.75 |
| sigmoid_b | -0.9375 |
| alpha | 1.7 |
used model parameters (may differ from above)
| C | degree |
| acceptor | 2 | 3 |
| donor | 1 | 3 |
Locality Improved Kernel
| sigmoid_a | 0.75 |
| sigmoid_b | -0.75 |
| alpha | 1.0 |
used model parameters (may differ from above!)
| degree | width | C |
| acceptor | 4 | 15 | 2 |
| donor | 3 | 10 | 5 |
-
Download Implementation
wd_kernel.cpp
Please not that the Shogun toolbox contains an easy-to-use version of that kernel.
|