BioPatML - an XML description language for patterns in biological sequences

Maetschke, Stefan, Towsey, Michael W., & Hogan, James M. (2007) BioPatML - an XML description language for patterns in biological sequences. QUT.


Background: A major challenge in computational biology is the description of biological systems in a way that allows their computational evaluation and exchange between institutes and applications. Recent modeling languages that describe various aspects of biological systems such as the genomic composition, spatio-temporal quantities or biochemical reactions predominately rely on XML (eXtensible Markup Language), as a standardized and well supported format. An exception are description languages for patterns in biological sequences that show a great diversity in format and function, which impedes the definition and the exchange of biological patterns.

Results: In this paper we introduce BioPatML, an XML-based pattern description language that supports a wide variety of patterns and allows the construction of complex, hierarchically structured patterns and pattern libraries. BioPatML unifies the diversity of current pattern description languages and fills a gap in the set of XML-based description languages for biological systems. The paper discusses the structure and elements of the language, and demonstrates its advantages on three applications. An XML schema, manual and Diana, a command line tool to search BioPatML patterns in nucleotide and amino acid sequences, are available at

Conclusions: BioPatML increases the power of classical pattern definition languages through principled aggregation. It furthmore simplifies the compilation of pattern libraries and promotes exchange of complex patterns. The language provides a convenient format to encapsulate pattern definitions and their annotations for integrated bioinformatic analyses.

Impact and interest:

Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

865 since deposited on 16 May 2007
42 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 7730
Item Type: Report
Refereed: No
Keywords: pattern description, biological sequences, XML
Subjects: Australian and New Zealand Standard Research Classification > TECHNOLOGY (100000) > MEDICAL BIOTECHNOLOGY (100400) > Medical Biotechnology not elsewhere classified (100499)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)
Australian and New Zealand Standard Research Classification > BIOLOGICAL SCIENCES (060000) > BIOCHEMISTRY AND CELL BIOLOGY (060100) > Protein Trafficking (060108)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Current > Institutes > Institute of Health and Biomedical Innovation
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Deposited On: 16 May 2007 00:00
Last Modified: 27 Oct 2015 23:38

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page