Reconstructible phylogenetic networks: do not distinguish the indistinguishable
Abstract
We consider here an elementary question for the inference of phylogenetic networks: what networks can be reconstructed. Indeed, whereas in theory it is always possible to reconstruct a phylogenetic tree, given sufficient data for this task, the same does not hold for phylogenetic networks: most notably, the relative order of consecutive reticulate events cannot be determined by standard network inference methods. This problem has been described before, but no solutions to deal with it have been put forward. Here we propose limiting the space of reconstructible phylogenetic networks to what we call “canonical networks”. We formally prove that each network has a (usually unique) canonical form—where a number of nodes and branches are merged—representing all that can be uniquely reconstructed about the original network. Once a canonical network N is inferred, it must be kept in mind that—even with perfect and unlimited data—the true phylogenetic network is just one of the potentially many networks having N as canonical form. This is an important difference to what biologists are used to for phylogenetic trees, where in principle it is always possible to resolve uncertainties, given enough data.