Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood
Résumé
MOTIVATION: Maximum likelihood methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult data sets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing times and can become trapped in bad local optima of the likelihood function. When this occurs, the resulting trees may still show some of the defects (e.g. long branch attraction) of starting trees obtained using fast distance or parsimony programs.
METHODS: Subtree Pruning and Regrafting (SPR) topological rearrangements are usually sufficient to intensively search the tree space. Here, we propose two new methods to make SPR moves more efficient. The first method uses a fast distance-based approach to detect the least promising candidate SPR moves, which are then simply discarded. The second method locally estimates the change in likelihood for any remaining potential SPRs, as opposed to globally evaluating the entire tree for each possible move. These two methods are implemented in a new algorithm with a sophisticated filtering strategy, which efficiently selects potential SPRs and concentrates most of the likelihood computation on the promising moves.
RESULTS: Experiments with real data sets comprising 35 to 250 taxa show that, while indeed greatly reducing the amount of computation, our approach provides likelihood values at least as good as those of the best known ML methods so far, and is very robust to poor starting trees. Furthermore, combining our new SPR algorithm with local moves such as PHYML's nearest neighbor interchanges, the time needed to find good solutions can sometimes be reduced even more.
AVAILABILITY: Executables of our SPR program and the used data sets are available for download at http://atgc.lirmm.fr/spr.
METHODS: Subtree Pruning and Regrafting (SPR) topological rearrangements are usually sufficient to intensively search the tree space. Here, we propose two new methods to make SPR moves more efficient. The first method uses a fast distance-based approach to detect the least promising candidate SPR moves, which are then simply discarded. The second method locally estimates the change in likelihood for any remaining potential SPRs, as opposed to globally evaluating the entire tree for each possible move. These two methods are implemented in a new algorithm with a sophisticated filtering strategy, which efficiently selects potential SPRs and concentrates most of the likelihood computation on the promising moves.
RESULTS: Experiments with real data sets comprising 35 to 250 taxa show that, while indeed greatly reducing the amount of computation, our approach provides likelihood values at least as good as those of the best known ML methods so far, and is very robust to poor starting trees. Furthermore, combining our new SPR algorithm with local moves such as PHYML's nearest neighbor interchanges, the time needed to find good solutions can sometimes be reduced even more.
AVAILABILITY: Executables of our SPR program and the used data sets are available for download at http://atgc.lirmm.fr/spr.