An ensemble method for predicting subnuclear localizations from primary protein structures

Han, Guo-Sheng, Yu, Zu-Guo, Anh, Vo, Krishnajith, Anaththa P.D., & Tian, Yu-Chu (2013) An ensemble method for predicting subnuclear localizations from primary protein structures. PLOS ONE, 8(2), e57225:1-e57225:14.

View at publisher

Abstract

Background

Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods.

Methodology/Principal Findings

A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis.

Conclusions

It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpr​ed_page.php.

Impact and interest:

14 citations in Scopus
Search Google Scholar™
18 citations in Web of Science®

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

145 since deposited on 24 Mar 2013
12 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 58623
Item Type: Journal Article
Refereed: Yes
Keywords: Bioinformatics, Subnuclear localizations, Protein structures
DOI: 10.1371/journal.pone.0057225
ISSN: 1932-6203
Subjects: Australian and New Zealand Standard Research Classification > BIOLOGICAL SCIENCES (060000) > BIOCHEMISTRY AND CELL BIOLOGY (060100) > Bioinformatics (060102)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > COMPUTATION THEORY AND MATHEMATICS (080200) > Computation Theory and Mathematics not elsewhere classified (080299)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > COMPUTER SOFTWARE (080300) > Bioinformatics Software (080301)
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > Schools > School of Mathematical Sciences
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2013 The Authors
Copyright Statement: Copyright: 2013 Han et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.
Deposited On: 24 Mar 2013 23:40
Last Modified: 05 Mar 2015 23:25

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page