BioPatML - an XML description language for patterns in biological sequences
Background: A major challenge in computational biology is the description of biological systems in a way that allows their computational evaluation and exchange between institutes and applications. Recent modeling languages that describe various aspects of biological systems such as the genomic composition, spatio-temporal quantities or biochemical reactions predominately rely on XML (eXtensible Markup Language), as a standardized and well supported format. An exception are description languages for patterns in biological sequences that show a great diversity in format and function, which impedes the definition and the exchange of biological patterns.
Results: In this paper we introduce BioPatML, an XML-based pattern description language that supports a wide variety of patterns and allows the construction of complex, hierarchically structured patterns and pattern libraries. BioPatML unifies the diversity of current pattern description languages and fills a gap in the set of XML-based description languages for biological systems. The paper discusses the structure and elements of the language, and demonstrates its advantages on three applications. An XML schema, manual and Diana, a command line tool to search BioPatML patterns in nucleotide and amino acid sequences, are available at http://eresearch.fit.qut.edu.au/biopatml.
Conclusions: BioPatML increases the power of classical pattern definition languages through principled aggregation. It furthmore simplifies the compilation of pattern libraries and promotes exchange of complex patterns. The language provides a convenient format to encapsulate pattern definitions and their annotations for integrated bioinformatic analyses.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Keywords:||pattern description, biological sequences, XML|
|Subjects:||Australian and New Zealand Standard Research Classification > TECHNOLOGY (100000) > MEDICAL BIOTECHNOLOGY (100400) > Medical Biotechnology not elsewhere classified (100499)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)
Australian and New Zealand Standard Research Classification > BIOLOGICAL SCIENCES (060000) > BIOCHEMISTRY AND CELL BIOLOGY (060100) > Protein Trafficking (060108)
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Science and Technology
Current > Institutes > Institute of Health and Biomedical Innovation
|Deposited On:||16 May 2007|
|Last Modified:||09 Jun 2010 12:40|
Repository Staff Only: item control page