Software understanding: Automatic classification of software identifiers

Pattaraporn Warintarawej 1 Anne Laurent 1 Marianne Huchard 2 Mathieu Lafourcade 3 Pierre Pompidor 4
2 MAREL - Models And Reuse Engineering, Languages
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
3 TEXTE - Exploration et exploitation de données textuelles
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
4 ADVANSE - ADVanced Analytics for data SciencE
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Identifier names (e.g., packages, classes, methods, variables) are one of most important software comprehension sources. Identifier names need to be analyzed in order to support collaborative software engineering and to reuse source codes. Indeed, they convey domain concept of softwares. For instance, ''getMinimumSupport'' would be associated with association rule concept in data mining softwares, while some are difficult to recognize such as the case of mixing parts of words (e.g., ''initFeatSet''). We thus propose methods for assisting automatic software understanding by classifying identifier names into domain concept categories. An innovative solution based on data mining algorithms is proposed. Our approach aims to learn character patterns of identifier names. The main challenges are (1) to automatically split identifier names into relevant constituent subnames (2) to build a model associating such a set of subnames to predefined domain concepts. For this purpose, we propose a novel manner for splitting such identifiers into their constituent words and use N-grams based text classification to predict the related domain concept. In this article, we report the theoretical method and the algorithms we propose, together with the experiments run on real software source codes that show the interest of our approach.
Type de document :
Article dans une revue
Intelligent Data Analysis, IOS Press, 2015, 19 (4), pp.761-778. 〈10.3233/IDA-150744〉
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00834051
Contributeur : Pierre Pompidor <>
Soumis le : vendredi 14 juin 2013 - 09:19:15
Dernière modification le : jeudi 24 mai 2018 - 15:59:25
Document(s) archivé(s) le : dimanche 15 septembre 2013 - 04:10:11

Fichier

Software_Understanding.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Pattaraporn Warintarawej, Anne Laurent, Marianne Huchard, Mathieu Lafourcade, Pierre Pompidor. Software understanding: Automatic classification of software identifiers. Intelligent Data Analysis, IOS Press, 2015, 19 (4), pp.761-778. 〈10.3233/IDA-150744〉. 〈lirmm-00834051〉

Partager

Métriques

Consultations de la notice

543

Téléchargements de fichiers

429