Skip to Main content Skip to Navigation
Conference papers

Text Segmentation based on Document Understanding for Information Retrieval

Abstract : Information retrieval needs to match relevant texts with a given query. Selecting appropriate parts is useful when documents are long, and only portions are interesting to the user. In this paper, we describe a method that extensively uses natural language techniques for text segmentation based on topic change detection. The method requires a NLP-parser and a semantic representation in Roget-based vectors. We have run the experiment on French documents, for which we have the appropriate tools, but the method could be transposed to any other lan- guage with the same requirements. The article sketches an overview of the NL understanding environment functionalities, and the algorithms related to our text segmentation method. An experiment in text seg- mentation is also presented and its result in an information retrieval task is shown.
Document type :
Conference papers
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00161996
Contributor : Alexandre Labadié <>
Submitted on : Thursday, July 12, 2007 - 10:31:42 AM
Last modification on : Monday, September 16, 2019 - 11:36:51 AM
Long-term archiving on: : Thursday, April 8, 2010 - 11:02:54 PM

Identifiers

Collections

Citation

Violaine Prince, Alexandre Labadié. Text Segmentation based on Document Understanding for Information Retrieval. NLDB: Natural Language Processing and Information Systems, Jun 2007, Paris, France. pp.295-304, ⟨10.1007/978-3-540-73351-5_26⟩. ⟨lirmm-00161996⟩

Share

Metrics

Record views

224

Files downloads

2098