Automatic Titling of Electronic Documents with Noun Phrase Extraction

Cédric Lopez; Violaine Prince; Mathieu Roche

Communication Dans Un Congrès Année : 2010

Automatic Titling of Electronic Documents with Noun Phrase Extraction

(1) , (2) , (2)

1
2

Cédric Lopez

Fonction : Auteur
PersonId : 960390
ORCID : 0000-0002-4933-5720
IdRef : 164704922

Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier

Violaine Prince

Fonction : Auteur
PersonId : 942907
ORCID : 0000-0002-5997-9677

Exploration et exploitation de données textuelles

Mathieu Roche

Fonction : Auteur
PersonId : 4967
IdHAL : mathieu-roche
ORCID : 0000-0003-3272-8568
IdRef : 09042087X

Exploration et exploitation de données textuelles

Résumé

Automatic titling (i.e. providing titles) is one of key domains of Web site accessibility. This paper provides an approach allowing the automatic titling of texts (e.g. emails, fora, etc.) relying on the morphosyntactic study of human written titles in a corpus of various texts. The method is developed in four stages: Corpus acquisition, candidate sentences determination for titling, noun phrase extraction in the candidate sentences, and finally, selecting a particular noun phrase to play the role of the text title (ChTITRES approach). The method has been evaluated by ten users, and the satisfaction enquiry shows that the titles selected through this process are relevant.

Domaines

Traitement du texte et du document

Fichier principal

4_pages.pdf (120.57 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Cédric Lopez : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00563903

Soumis le : lundi 7 février 2011-15:31:41

Dernière modification le : mardi 16 janvier 2024-16:27:52

Archivage à long terme le : dimanche 8 mai 2011-03:37:24

Dates et versions

lirmm-00563903 , version 1 (07-02-2011)

Identifiants

HAL Id : lirmm-00563903 , version 1

Citer

Cédric Lopez, Violaine Prince, Mathieu Roche. Automatic Titling of Electronic Documents with Noun Phrase Extraction. SOCPAR'10: SOft Computing and PAttern Recognition, France. pp.168-171. ⟨lirmm-00563903⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS TEXTE LIRMM GENCI MIPS UNIV-MONTPELLIER

159 Consultations

316 Téléchargements

Automatic Titling of Electronic Documents with Noun Phrase Extraction

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager