# Numerical sequence representation of DNA sequences and methods to distinguish coding and non-coding sequences in a complete genome

Yu, Zu-Guo, Anh, Vo V., Zhou, Yu, & Zhou, Li-Qian
(2007)
Numerical sequence representation of DNA sequences and methods to distinguish coding and non-coding sequences in a complete genome. In
Callaos, N., Lesso, W., Zinn, C., & Zmazek, B. (Eds.)
*WMSCI 2007*, The International Institute of Informatics and Systemics (IIIS), Florida, USA, pp. 171-176.

## Abstract

In this presentation we introduce two methods to distinguish coding and non-coding sequences in a complete genome. A numerical sequence representation of DNA sequences is introduced first. There exists a one-to-one correspondence between a DNA sequence and its numerical sequence representation. In the first method, three exponents from a multifractal analysis are selected to construct the parameter space. In the second method, which is based on a Fourier transform approach, three parameters from the power spectrum of the numerical sequence representation are selected to construct the parameter space. Each DNA may be represented by a point in these three-dimensional spaces. We found that the points corresponding to coding and non-coding sequences in the complete genomes of prokaryotes are divided into different regions in both parameter spaces. If the point for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is recognized as a coding sequence; otherwise, the sequence is classified as a non-coding one. The average accuracies using Fisher's discriminant algorithm for coding and non-coding sequences are satisfactory.

Impact and interest:

**Citation counts** are sourced monthly from **Scopus** and **Web of Science®** citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the **Google Scholar™** indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

**316**since deposited on 18 Nov 2008

**44**in the past twelve months

**Full-text downloads** displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: | 15651 |
---|---|

Item Type: | Conference Paper |

Refereed: | Yes |

Additional URLs: | |

ISBN: | 1934272140, 9781934272145 |

Subjects: | Australian and New Zealand Standard Research Classification > MATHEMATICAL SCIENCES (010000) > APPLIED MATHEMATICS (010200) > Biological Mathematics (010202) Australian and New Zealand Standard Research Classification > BIOLOGICAL SCIENCES (060000) > GENETICS (060400) > Genome Structure and Regulation (060407) |

Divisions: | Past > QUT Faculties & Divisions > Faculty of Science and Technology Current > Schools > School of Mathematical Sciences |

Copyright Owner: | Copyright 2007 (please consult author) |

Deposited On: | 18 Nov 2008 |

Last Modified: | 06 Mar 2015 00:48 |

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page