Skip to main content

Research Repository

Advanced Search

Inadvertent Paralog Inclusion Drives Artifactual Topologies and Timetree Estimates in Phylogenomics

Siu-Ting, Karen; Torres-Sánchez, María; San Mauro, Diego; Wilcockson, David; Wilkinson, Mark; Pisani, Davide; O’Connell, Mary J.; Creevey, Christopher J.

Inadvertent Paralog Inclusion Drives Artifactual Topologies and Timetree Estimates in Phylogenomics Thumbnail


Karen Siu-Ting

María Torres-Sánchez

Diego San Mauro

David Wilcockson

Mark Wilkinson

Davide Pisani

Christopher J. Creevey


Fabia Ursula Battistuzzi


Increasingly, large phylogenomic datasets include transcriptomic data from non-model organisms. This has allowed controversial and unexplored evolutionary relationships in the tree of life to be addressed but also increases the risk of inadvertent inclusion of paralogs in the analysis. While this may be expected to result in decreased phylogenetic support it is not clear if it could also drive highly supported artefactual relationships. Many groups, including the hyper-diverse Lissamphibia, are especially susceptible to these issues due to ancient gene duplication events, small numbers of sequenced genomes and because transcriptomes are increasingly applied to resolve historically conflicting taxonomic hypotheses. We tested the potential impact of paralog inclusion on the topologies and timetree estimates of the Lissamphibia using published and de novo sequencing data including 18 amphibian species, from which 2,656 single-copy gene families were identified. A novel paralog filtering approach resulted in four differently curated datasets, which were used for phylogenetic reconstructions using Bayesian inference, maximum likelihood and quartet-based supertrees. We found that paralogs drive strongly supported conflicting hypotheses within the Lissamphibia (Batrachia and Procera) and older divergence time estimates even within groups where no variation in topology was observed. All investigated methods, except Bayesian inference with the CAT-GTR model, were found to be sensitive to paralogs, but with filtering convergence to the same answer (Batrachia) was observed. This is the first large-scale study to address the impact of orthology selection using transcriptomic data and emphasises the importance of quality over quantity particularly for understanding relationships of poorly sampled taxa.


Siu-Ting, K., Torres-Sánchez, M., San Mauro, D., Wilcockson, D., Wilkinson, M., Pisani, D., …Creevey, C. J. (2019). Inadvertent Paralog Inclusion Drives Artifactual Topologies and Timetree Estimates in Phylogenomics. Molecular Biology and Evolution, 36(6), 1344-1356.

Journal Article Type Article
Acceptance Date Mar 12, 2019
Online Publication Date Mar 23, 2019
Publication Date Jun 1, 2019
Deposit Date Mar 13, 2019
Publicly Available Date Mar 24, 2020
Journal Molecular Biology and Evolution
Print ISSN 0737-4038
Electronic ISSN 1537-1719
Publisher Oxford University Press
Peer Reviewed Peer Reviewed
Volume 36
Issue 6
Pages 1344-1356
Keywords Genetics; Ecology, Evolution, Behavior and Systematics; Molecular Biology
Public URL
Publisher URL
Contract Date Mar 13, 2019


You might also like

Downloadable Citations