Prediction server designed using the adaptive double threading approach as described in: Learning MHC-I Peptide Binding
Nebojsa Jojic , Manuel Reyes-Gomez , et al.
International Conference on Intelligent Systems for Molecular Biology
Fortaleza, Brazil


Abstract:

Motivated by the ability of a simple threading approach to predict MHC-peptide binding, we developed a new and improved structure-based model for which parameters can be estimated from additional sources of data about MHC-peptide binding. In addition to the known 3D structures of a small number of MHC-peptide complexes that were used in the original threading approach, we included three other sources of information on peptide-MHC binding: (1) MHC class I sequences; (2) known binding energies for a large number of MHC-peptide complexes; and (3) an even larger binary dataset that contains information about strong binders (epitopes) and non-binders (peptides that have a low affinity for a particular MHC molecule).
Our model significantly outperforms the standard threading approach in binding energy prediction. We used the resulting binding energy predictor to study viral infections in 246 HIV patients from the West Australian cohort, and over 1000 sequences in HIV clade B from Los Alamos National Laboratory database, capturing the course of HIV evolution over the last 20 years. Finally, we illustrate short-, medium-, and long-term adaptation of HIV to the human immune system.


1a). Type (paste) allele(s) in a column  or/and Select allele(s) from the lists
Then press the capture alleles buttondddd 
   <-         
            

1 b) Additionally you can upload a fasta file with the alleles defined as sequences of aminoacids.
The sequences should be at least 175 aminoacids long and they should be aligned (and start) like as in
the following file:




2. Training database:




3. Select peptide Length:

4. Enter Peptides/Sequences:
Peptides with non valid aminoacid letters:'ACDEFGHIKLMNPQRSTVWYBZ'; ('B'='D'; 'Z' = 'E','Q') are discarded with the exception of 'X' and '-'. 
When 'X' is encountered, its value is assumed to be any of the 20 aminoacids and the binding energies of the peptides that interact with it are given
by the average value over the 20 possible resulting peptides. When characters '-' are encountered the sequence is processed skipping them.
(The asumption si that they are use for alignment purposes only).  Within a sequence only the peptides that interact with the invalid letter are discarded
and their energies are reported as "NA".

Type/paste peptides or sequences using the fasta format.   OR  Select a fasta file:








Additional Information:

Some results for the adaptive double threading approach used as an epitope classifier.

ROC for binders/non binders

      For these MHC types, for which the 3D structure is available, as well as experimental binding energy values for a set of peptides. The Bilinear model was trained with the peptides for which experimental binding energies are available and later tested on classifying peptides from the Syfpeithi Database.

      For these MHC types, for which the 3D structure is available but not experimental energy values are available, we combined the binary energy peptides from the  Syfpeithi and MHCBN databases to run our experiments. We randomly partition the data into 10 (70% training-30% Testing) partitions. Results are reported in terms of both the average performance and the standard deviation.

For these MHC types, for which no 3D structure is available and not experimental energy values are available, we combined the binary energy peptides from the  Syfpeithi and MHCBN databases to run our experiments. We randomly partition the data into 10 (70% training-30% Testing) partitions. Results are reported in terms of both the average performance and the standard deviation. The 3D structural data is estimated as described in the paper.

For the examples were we don’t have any binding data available (even the binary one) training of the bilinear model can’t be done. However just transferring/estimating the 3D structure from one HLA type to another can greatly improve the binding classification results using threading, which would be equivalent to set the HLA dependant weights wmj to unity while still using the learned global pair wise potentials fsi . Even though we have binding data for these MHC molecules, (otherwise we wouldn’t be able to get the ROC curves) we pretend we don’t have them and therefore we set their wmj weights to unity. The plots present results using the correct 3D structure for that MHC molecule and the results using a suboptimal (not the closest one, sequencewise) 3D structure. The plots also show the improvement in performance when the appropriate amino acids substitutions are done in the binding groove of the “borrowing” MHC molecule.

 

Overview of prediction performance as measured for AUC values for the 5-fold cross validation sets for the HLA-peptide binding energies database presented at [1].
ADT : refers to the adaptive double threading.
ANN : refers to the approach described in [2], found as the best predictor on [1].
Best Online refers to the best external (external to the authors on [1]) online predictor tested on [1].
Tool refers to the identity of such best online predictor.

HLA Number of Peptides AUC ADT AUC ANN Best Online Tool
A_0101 1158 0.9657 0.9798 0.955 hla ligand
A_0201 3090 0.9521 0.9564 0.922 hla_a2_smm
A_0202 1448 0.9033 0.8988 0.793 multipredann
A_0203 1444 0.9141 0.9203 0.788 multipredann
A_0206 1438 0.9191 0.9261 0.735 multipredann
A_0301 2095 0.9298 0.9366 0.851 multipredann
A_1101 1986 0.9442 0.9511 0.869 multipredann
A_2301 105 0.8044 0.8514 multipredann
A_2402 198 0.7852 0.822 0.77 syfphethi
A_2403 255 0.8784 0.9175
A_2601 673 0.9224 0.9552 0.736 pepdist
A_2902 161 0.8866 0.9317 0.597 rankpep
A_3001 670 0.941 0.945
A_3002 93 0.7633 0.744
A_3101 1870 0.9313 0.9274 0.829 bimas
A_3301 1141 0.9363 0.9141 0.807 pepdist
A_6801 1142 0.8847 0.8823 0.772 syfphethi
A_6802 1435 0.8963 0.8986 0.643 mhcpred
A_6901 834 0.8902 0.8803
B_0702 1263 0.9573 0.9636 0.942 hlaligand
B_0801 709 0.854 0.9533 0.766 pepdist
B_1501 979 0.9075 0.942 0.816 rankpep
B_1801 119 0.8687 0.838 0.779 pepdist
B_2705 970 0.9217 0.9371 0.926 bimas
B_3501 737 0.8691 0.8739 0.792 bimas
B_4001 1079 0.8933 0.9155
B_4002 119 0.8186 0.7524 0.775 rankpep
B_4402 120 0.6775 0.7785 0.783 syfphethi
B_4403 120 0.6239 0.7634 0.628 rankpep
B_4501 115 0.8015 0.8609
B_5101 245 0.8474 0.8856 0.82 pepdist
B_5301 255 0.8934 0.8974 0.861 rankpep
B_5401 256 0.8457 0.9025 0.799 svmhc
B_5701 60 0.832 0.8246 0.767 pepdist
B_5801 989 0.94 0.96 0.899 bimas

 

References:
[1] B. Peters, HH Bui, S. Frankild, M. Nielsen, C. Lundegaard, et. al., “A Community Resource Benchmarking Predictions of Peptide Binding to MHC-I Molecules,: PLoS Computational Biology (2006) In press. DOI 10.1371/jornal.pcbi.0020065.eor [2] M. Nielsen, C. Lundegaard, P. Worning, SL Lauemoller, K. Lamberth, et al. “Reliable prediction of T-cell epitopes using neural networks with novel sequence

  ©2006 Microsoft Corporation. All rights reserved.   Terms of Use  |  Privacy Statement  | Contact Us