Software understanding: Automatic classification of software identifiers

Pattaraporn Warintarawej; Anne Laurent; Marianne Huchard; Mathieu Lafourcade; Pierre Pompidor

doi:10.3233/IDA-150744

Article Dans Une Revue Intelligent Data Analysis Année : 2015

Software understanding: Automatic classification of software identifiers

(1) , (2) , (3) , (4) , (1)

1
2
3
4

Pattaraporn Warintarawej

Fonction : Auteur
PersonId : 937892

ADVanced Analytics for data SciencE

Anne Laurent

Fonction : Auteur
PersonId : 21743
IdHAL : anne-laurent
ORCID : 0000-0003-3708-6429
IdRef : 075173735

WEB Architecture x Semantic WEB x WEB of Data

Marianne Huchard

Fonction : Auteur
PersonId : 8651
IdHAL : marianne-huchard
ORCID : 0000-0002-6309-7503
IdRef : 060595175

Models And Reuse Engineering, Languages

Mathieu Lafourcade

Fonction : Auteur
PersonId : 172381
IdHAL : mathieu-lafourcade
ORCID : 0000-0003-2832-2143

Exploration et exploitation de données textuelles

Pierre Pompidor

Fonction : Auteur
PersonId : 170558
IdHAL : pierre-pompidor
ORCID : 0000-0001-5466-5137

ADVanced Analytics for data SciencE

Résumé

Identifier names (e.g., packages, classes, methods, variables) are one of most important software comprehension sources. Identifier names need to be analyzed in order to support collaborative software engineering and to reuse source codes. Indeed, they convey domain concept of softwares. For instance, ''getMinimumSupport'' would be associated with association rule concept in data mining softwares, while some are difficult to recognize such as the case of mixing parts of words (e.g., ''initFeatSet''). We thus propose methods for assisting automatic software understanding by classifying identifier names into domain concept categories. An innovative solution based on data mining algorithms is proposed. Our approach aims to learn character patterns of identifier names. The main challenges are (1) to automatically split identifier names into relevant constituent subnames (2) to build a model associating such a set of subnames to predefined domain concepts. For this purpose, we propose a novel manner for splitting such identifiers into their constituent words and use N-grams based text classification to predict the related domain concept. In this article, we report the theoretical method and the algorithms we propose, together with the experiments run on real software source codes that show the interest of our approach.

Mots clés

Automatic Software Understanding Text classification Software Engineering Data Mining

Domaines

Recherche d'information [cs.IR]

Fichier principal

Software_Understanding.pdf (340.67 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Pierre Pompidor : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00834051

Soumis le : vendredi 14 juin 2013-09:19:15

Dernière modification le : dimanche 17 mars 2024-11:09:11

Archivage à long terme le : dimanche 15 septembre 2013-04:10:11

Dates et versions

lirmm-00834051 , version 1 (14-06-2013)

Identifiants

HAL Id : lirmm-00834051 , version 1
DOI : 10.3233/IDA-150744

Citer

Pattaraporn Warintarawej, Anne Laurent, Marianne Huchard, Mathieu Lafourcade, Pierre Pompidor. Software understanding: Automatic classification of software identifiers. Intelligent Data Analysis, 2015, 19 (4), pp.761-778. ⟨10.3233/IDA-150744⟩. ⟨lirmm-00834051⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS ADVANSE TEXTE MAREL LIRMM MIPS UNIV-MONTPELLIER FADO WEB-CUBE

435 Consultations

653 Téléchargements

Software understanding: Automatic classification of software identifiers

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager