The cross-species prediction of bacterial promoters using a support vector machine

Towsey, Michael W., Timms, Peter, Hogan, James M., & Mathews, Sarah A. (2008) The cross-species prediction of bacterial promoters using a support vector machine. Computational Biology and Chemistry, 32(5), pp. 359-366.

View at publisher


Due to degeneracy of the observed binding sites, the in silico prediction of bacterial sigma(70)-like promoters remains a challenging problem. A large number of sigma(70)-like promoters has been biologically identified in only two species, Escherichia coli and Bacillus subtilis. In this paper we investigate the issues that arise when searching for promoters in other species using an ensemble of SVM classifiers trained on E. coli promoters. DNA sequences are represented using a tagged mismatch string kernel. The major benefit of our approach is that it does not require a prior definition of the typical -35 and -10 hexamers. This gives the SVM classifiers the freedom to discover other features relevant to the prediction of promoters. We use our approach to predict sigma(A) promoters in B. subtilis and sigma(66) promoters in Chlamydia trachomatis. We extended the analysis to identify specific regulatory features of gene sets in C. trachomatis having different expression profiles. We found a strong -35 hexamer and TGN/-10 associated with a set of early expressed genes. Our analysis highlights the advantage of using TSS-PREDICT as a starting point for predicting promoters in species where few are known.

Impact and interest:

10 citations in Scopus
7 citations in Web of Science®
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 17083
Item Type: Journal Article
Refereed: Yes
Keywords: bioinformatics, support vector machine, sigma factor, promoters, transcript start site
DOI: 10.1016/j.compbiolchem.2008.07.009
ISSN: 1476-9271
Subjects: Australian and New Zealand Standard Research Classification > BIOLOGICAL SCIENCES (060000) > BIOCHEMISTRY AND CELL BIOLOGY (060100) > Bioinformatics (060102)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Current > Institutes > Institute for Future Environments
Current > Institutes > Institute of Health and Biomedical Innovation
Past > Schools > School of Life Sciences
Past > Schools > School of Software Engineering & Data Communications
Copyright Owner: Copyright 2008 Elsevier
Deposited On: 05 Jan 2009 00:48
Last Modified: 29 Feb 2012 13:49

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page