The Prediction of Bacterial Transcription Start Sites using Support Vector Machines

Towsey, Michael W., Gordon, James J., & Hogan, James M. (2006) The Prediction of Bacterial Transcription Start Sites using Support Vector Machines. International Journal of Neural Systems, 16(5), pp. 363-370.

View at publisher


Identifying promoters is the key to understanding gene expression in bacteria. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). In this paper, we address the problem of predicting transcription start sites in Escherichia coli. Knowing the TSS position, one can then predict the promoter position to within a few base pairs, and vice versa. The accepted method for promoter prediction is to use a pair of position weight matrices (PWMs), which define conserved motifs at the sigma-factor binding site. However this method is known to result in a large number of false positive predictions, thereby limiting its usefulness to the experimental biologist. We adopt an alternative approach based on the Support Vector Machine (SVM) using a modified mismatch spectrum kernel. Our modifications involve tagging the motifs with their location, and selectively pruning the feature set. We quantify the performance of several SVM models and a PWM model using a performance metric of area under the detection-error tradeoff (DET) curve. SVM models are shown to outperform the PWM on a biologically realistic TSS prediction task. We also describe a more broadly applicable peak scoring technique which reduces the number of false positive predictions, greatly enhancing the utility of our results.

Impact and interest:

5 citations in Scopus
6 citations in Web of Science®
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

411 since deposited on 06 Mar 2007
8 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 6367
Item Type: Journal Article
Refereed: Yes
Keywords: promoter prediction, support vector machines, position weight matrices
DOI: 10.1142/S0129065706000767
ISSN: 0129-0657
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Copyright Owner: Copyright 2006 World Scientific Publishing
Copyright Statement: Reproduced in accordance with the copyright policy of the publisher.
Deposited On: 06 Mar 2007 00:00
Last Modified: 29 Feb 2012 13:23

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page