Inferring phylogenies of evolving sequences without multiple sequence alignment

Chan, Cheong Xin, Bernard, Guillaume, Poirion, Olivier, Hogan, James M., & Ragan, Mark A. (2014) Inferring phylogenies of evolving sequences without multiple sequence alignment. Scientific Reports, 4, e6504:1-e6504:9.

View at publisher (open access)


Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.

Impact and interest:

11 citations in Scopus
Search Google Scholar™
9 citations in Web of Science®

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

52 since deposited on 09 Mar 2015
16 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 82330
Item Type: Journal Article
Refereed: Yes
Keywords: Approximate word matches, Maximum-likelihood, Evolution, Gene, Performance, Trees
DOI: 10.1038/srep06504
ISSN: 2045-2322
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2014 The Author(s)
Copyright Statement: This work is licensed under a Creative Commons Attribution-NonCommercial-
ShareAlike 4.0 International License.The images or other third party material in this
article are included in the article’s Creative Commons license, unless indicated
otherwise in the credit line; if the material is not included under the Creative
Commons license, users will need to obtain permission from the license holder
in order to reproduce the material. To view a copy of this license, visit http://
Deposited On: 09 Mar 2015 23:06
Last Modified: 16 Mar 2015 00:03

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page