Feature-to-Code Traceability in Legacy Software Variants
Résumé
Existing similar software variants, developed by ad-hoc reuse technique such as left clone-and-own right, represent a starting point to build a software product line (SPL) core assets. To re-engineer such legacy software variants into an SPL for systematic reuse, it is important to be able to identify a mapping between features and their implementing source code elements in different variants. Information Retrieval (IR) methods have been used widely to support this mapping in a single software product. This paper proposes a new approach to improve the performance of IR methods when they are applied to a collection of software variants. The novelty of our approach is twofold. On the one hand, it exploits what software variants have in common and how they differ to improve the accuracy of IR results. On the other hand, it reduces the abstraction gap between features and source code by introducing an intermediate level called left code-topic right, for increasing the number of retrieved links that are relevant. We have applied our approach to a collection of seven variants of a large-scale system by using the ArgoUML-SPL modeling tool. The experimental results showed that our approach outperforms conventional application of IR methods as well as the most recent and relevant work on the subject.