BioPatML - an XML description language for patterns in biological sequences
Background: A major challenge in computational biology is the description of biological systems in a way that allows their computational evaluation and exchange between institutes and applications. Recent modeling languages that describe various aspects of biological systems such as the genomic composition, spatio-temporal quantities or biochemical reactions predominately rely on XML (eXtensible Markup Language), as a standardized and well supported format. An exception are description languages for patterns in biological sequences that show a great diversity in format and function, which impedes the definition and the exchange of biological patterns.
Results: In this paper we introduce BioPatML, an XML-based pattern description language that supports a wide variety of patterns and allows the construction of complex, hierarchically structured patterns and pattern libraries. BioPatML unifies the diversity of current pattern description languages and fills a gap in the set of XML-based description languages for biological systems. The paper discusses the structure and elements of the language, and demonstrates its advantages on three applications. An XML schema, manual and Diana, a command line tool to search BioPatML patterns in nucleotide and amino acid sequences, are available at http://eresearch.fit.qut.edu.au/biopatml.
Conclusions: BioPatML increases the power of classical pattern definition languages through principled aggregation. It furthmore simplifies the compilation of pattern libraries and promotes exchange of complex patterns. The language provides a convenient format to encapsulate pattern definitions and their annotations for integrated bioinformatic analyses.
Impact and interest:
Citation countsare sourced monthly fromand citation databases.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Keywords:||pattern description, biological sequences, XML|
|Subjects:||Australian and New Zealand Standard Research Classification > TECHNOLOGY (100000) > MEDICAL BIOTECHNOLOGY (100400) > Medical Biotechnology not elsewhere classified (100499)|
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)
Australian and New Zealand Standard Research Classification > BIOLOGICAL SCIENCES (060000) > BIOCHEMISTRY AND CELL BIOLOGY (060100) > Protein Trafficking (060108)
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Science and Technology|
Current > Institutes > Institute of Health and Biomedical Innovation
|Deposited On:||16 May 2007|
|Last Modified:||09 Jun 2010 22:40|
Repository Staff Only: item control page