The Prediction of Bacterial Transcription Start Sites using Support Vector Machines
Towsey, Michael W., Gordon, James J., & Hogan, James M. (2006) The Prediction of Bacterial Transcription Start Sites using Support Vector Machines. International Journal of Neural Systems, 16(5), pp. 363-370.
Identifying promoters is the key to understanding gene expression in bacteria. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). In this paper, we address the problem of predicting transcription start sites in Escherichia coli. Knowing the TSS position, one can then predict the promoter position to within a few base pairs, and vice versa. The accepted method for promoter prediction is to use a pair of position weight matrices (PWMs), which define conserved motifs at the sigma-factor binding site. However this method is known to result in a large number of false positive predictions, thereby limiting its usefulness to the experimental biologist. We adopt an alternative approach based on the Support Vector Machine (SVM) using a modified mismatch spectrum kernel. Our modifications involve tagging the motifs with their location, and selectively pruning the feature set. We quantify the performance of several SVM models and a PWM model using a performance metric of area under the detection-error tradeoff (DET) curve. SVM models are shown to outperform the PWM on a biologically realistic TSS prediction task. We also describe a more broadly applicable peak scoring technique which reduces the number of false positive predictions, greatly enhancing the utility of our results.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||Journal Article|
|Keywords:||promoter prediction, support vector machines, position weight matrices|
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Science and Technology|
|Copyright Owner:||Copyright 2006 World Scientific Publishing|
|Copyright Statement:||Reproduced in accordance with the copyright policy of the publisher.|
|Deposited On:||06 Mar 2007 00:00|
|Last Modified:||29 Feb 2012 13:23|
Repository Staff Only: item control page