Skip to Main content Skip to Navigation
Conference papers

How Statistical Information from the Web can Help Identify Named Entities

Mathieu Roche 1
1 TEXTE - Exploration et exploitation de données textuelles
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : This paper presents a Natural Language Processing (NLP) approach to filter Named Entities (NE) from a list of collocation candidates. The NE are defined as the names of 'People', 'Places', 'Organizations', 'Software', 'Illnesses', and so forth. The proposed method is based on statistical measures associated with Web resources to identify NE. Our method has three stages: (1) Building artificial prepositional collocations from Noun-Noun candidates; (2) Measuring the "relevance" of the resulting prepositional collocations using statistical methods (Web Mining); (3) Selecting prepositional collocations. The evaluation of Noun-Noun collocations from French and English corpora confirmed the relevance of our system.
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download
Contributor : Mathieu Roche Connect in order to contact the contributor
Submitted on : Sunday, May 8, 2011 - 7:14:09 PM
Last modification on : Thursday, May 24, 2018 - 3:59:23 PM
Long-term archiving on: : Friday, November 9, 2012 - 10:55:38 AM


Files produced by the author(s)


  • HAL Id : lirmm-00588581, version 1



Mathieu Roche. How Statistical Information from the Web can Help Identify Named Entities. WEBIST'11: Web Information Systems and Technologies - Web and Text Mining Session, Netherlands. pp.685-689. ⟨lirmm-00588581⟩



Record views


Files downloads