Deploying Smart Program Understanding on a Large Code Base

Carlo Ieva 1 Arnaud Gotlieb 1 Souhila Kaci 2 Nadjib Lazaar 3
2 SMILE - Système Multi-agent, Interaction, Langage, Evolution
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
3 COCONUT - Agents, Apprentissage, Contraintes
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Program understanding aims at discovering human-readable properties of a software project from the analysis of its source code. Recently, we proposed a smart approach based on hierarchical agglomerative clustering that extracts so-called program topoi from source code. These topoi are high-level observable properties of the project. Based on textual and structural representations of the source code, our multi-steps approach clusters program topoi in an effective and efficient way. In this paper, we depict novel exploitation tasks of this program understanding approach and report on its application to Software Heritage. Software Heritage is an ambitious project which aims at collecting and archiving the biggest corpus of publicly available software source code. One of the project goals is to provide a new scientific instrument for computer scientists to evaluate advanced machine learning and software engineering methods on a very large source code repository. Our in-depth experiments reveal that unsupervised learning is the appropriate tool to mine and understand the biggest corpus of software source code ever produced.
Complete list of metadatas

https://hal-lirmm.ccsd.cnrs.fr/lirmm-02089733
Contributor : Nadjib Lazaar <>
Submitted on : Thursday, April 4, 2019 - 10:05:55 AM
Last modification on : Saturday, April 6, 2019 - 1:20:54 AM

File

AITest19-feat.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : lirmm-02089733, version 1

Collections

Citation

Carlo Ieva, Arnaud Gotlieb, Souhila Kaci, Nadjib Lazaar. Deploying Smart Program Understanding on a Large Code Base. AiTest: Artificial Intelligence Testing, Apr 2019, San Francisco, United States. ⟨lirmm-02089733⟩

Share

Metrics

Record views

36

Files downloads

32