Discovering Program Topoi via Hierarchical Agglomerative Clustering
Abstract
In long lifespan software systems, specification documents can be outdated or even missing. Developing new software releases or checking whether some user requirements are still valid becomes challenging in this context. This challenge can be addressed by extracting high-level observable capabilities of a system by mining its source code and the available source-level documentation. This paper presents feature extraction and traceabil-
ity (FEAT), an approach that automatically extracts topoi, which are summaries of the main capabilities of a program, given under the form of collections of code functions along with an index. FEAT acts in two steps: first, clustering: by mining the available source code, possibly augmented with code-level comments, hierarchical agglomerative clustering groups similar code functions. In addition, this process gathers an index for each function. Second, entry point selection: functions within a cluster are then ranked and presented to validation engineers as topoi candidates. We implemented FEAT on top of a general-purpose test management and optimization platform and performed an experimental study over 15 open-source software projects amounting to more than 1 M lines of codes proving that automatically discovering topoi is feasible and meaningful on realistic projects.
Origin | Files produced by the author(s) |
---|
Loading...