Discovering Program Topoi via Hierarchical Agglomerative Clustering

Carlo Ieva 1 Arnaud Gotlieb 1 Souhila Kaci 2 Nadjib Lazaar 3
2 SMILE - Système Multi-agent, Interaction, Langage, Evolution
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
3 COCONUT - Agents, Apprentissage, Contraintes
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : In long lifespan software systems, specification documents can be outdated or even missing. Developing new software releases or checking whether some user requirements are still valid becomes challenging in this context. This challenge can be addressed by extracting high-level observable capabilities of a system by mining its source code and the available source-level documentation. This paper presents feature extraction and traceabil- ity (FEAT), an approach that automatically extracts topoi, which are summaries of the main capabilities of a program, given under the form of collections of code functions along with an index. FEAT acts in two steps: first, clustering: by mining the available source code, possibly augmented with code-level comments, hierarchical agglomerative clustering groups similar code functions. In addition, this process gathers an index for each function. Second, entry point selection: functions within a cluster are then ranked and presented to validation engineers as topoi candidates. We implemented FEAT on top of a general-purpose test management and optimization platform and performed an experimental study over 15 open-source software projects amounting to more than 1 M lines of codes proving that automatically discovering topoi is feasible and meaningful on realistic projects.
Document type :
Journal articles
Complete list of metadatas

Cited literature [28 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-02088786
Contributor : Nadjib Lazaar <>
Submitted on : Thursday, April 4, 2019 - 10:25:33 AM
Last modification on : Thursday, October 3, 2019 - 3:36:02 PM

File

ieee-rel-jrnl-draft.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Carlo Ieva, Arnaud Gotlieb, Souhila Kaci, Nadjib Lazaar. Discovering Program Topoi via Hierarchical Agglomerative Clustering. IEEE Transactions on Reliability, Institute of Electrical and Electronics Engineers, 2018, 67 (3), pp.758-770. ⟨10.1109/TR.2018.2828135⟩. ⟨lirmm-02088786⟩

Share

Metrics

Record views

110

Files downloads

73