QUT ePrints

Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information

Song, Jiangning , Burrage, Kevin, Yuan, Zheng , & Huber, Thomas (2006) Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information. BMC Bioinformatics, 7, pp. 1-13.

View at publisher (open access)

Abstract

Background

The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. Results

In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. Conclusion

A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.

Impact and interest:

46 citations in Scopus
Search Google Scholar™
42 citations in Web of Science®

Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 44827
Item Type: Journal Article
DOI: 10.1186/1471-2105-7-124
ISSN: 1471-2105
Subjects: Australian and New Zealand Standard Research Classification > BIOLOGICAL SCIENCES (060000) > BIOCHEMISTRY AND CELL BIOLOGY (060100)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > COMPUTATION THEORY AND MATHEMATICS (080200)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Past > Schools > Mathematical Sciences
Deposited On: 25 Aug 2011 08:14
Last Modified: 01 Mar 2012 00:36

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page