Text Segmentation based on Document Understanding for Information Retrieval

Abstract : Information retrieval needs to match relevant texts with a given query. Selecting appropriate parts is useful when documents are long, and only portions are interesting to the user. In this paper, we describe a method that extensively uses natural language techniques for text segmentation based on topic change detection. The method requires a NLP-parser and a semantic representation in Roget-based vectors. We have run the experiment on French documents, for which we have the appropriate tools, but the method could be transposed to any other lan- guage with the same requirements. The article sketches an overview of the NL understanding environment functionalities, and the algorithms related to our text segmentation method. An experiment in text seg- mentation is also presented and its result in an information retrieval task is shown.
Type de document :
Communication dans un congrès
NLDB'07, Jun 2007, pp.295-304, 2007
Liste complète des métadonnées

Littérature citée [22 références]  Voir  Masquer  Télécharger

Contributeur : Alexandre Labadié <>
Soumis le : jeudi 12 juillet 2007 - 10:31:42
Dernière modification le : jeudi 24 mai 2018 - 15:59:23
Document(s) archivé(s) le : jeudi 8 avril 2010 - 23:02:54



  • HAL Id : lirmm-00161996, version 1



Violaine Prince, Alexandre Labadié. Text Segmentation based on Document Understanding for Information Retrieval. NLDB'07, Jun 2007, pp.295-304, 2007. 〈lirmm-00161996〉



Consultations de la notice


Téléchargements de fichiers