Instance Guaranteed Ratio on Greedy Heuristic for Genome Scaffolding
Résumé
The Scaffolding problem in bioinformatics, aims to complete the contig assembly process by determining the relative position and orientation of these contigs. Modeled as a combinatorial optimization problem in a graph named scaffold graph, this problem is NP-hard and its exact resolution is generally impossible on large instances. Hence, heuristics like polynomial-time approximation algorithms remain the only possibility to propose a solution. In general, even in the case where we know a constant guaranteed approximation ratio, it is impossible to know if the solution proposed by the algorithm is close to the optimal, or close to the bound defined by this ratio. In this paper we present a measure, associated to a greedy algorithm, determining an upper bound on the score of the optimal solution. This measure, depending on the instance, guarantees a – non constant – ratio for the greedy algorithm on this instance. We prove that this measure is a fine upper bound on optimal score, we perform experiments on real instances and show that the greedy algorithm yields near from optimal solutions.