How Statistical Information from the Web can Help Identify Named Entities - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Communication Dans Un Congrès Année : 2011

How Statistical Information from the Web can Help Identify Named Entities

Résumé

This paper presents a Natural Language Processing (NLP) approach to filter Named Entities (NE) from a list of collocation candidates. The NE are defined as the names of 'People', 'Places', 'Organizations', 'Software', 'Illnesses', and so forth. The proposed method is based on statistical measures associated with Web resources to identify NE. Our method has three stages: (1) Building artificial prepositional collocations from Noun-Noun candidates; (2) Measuring the "relevance" of the resulting prepositional collocations using statistical methods (Web Mining); (3) Selecting prepositional collocations. The evaluation of Noun-Noun collocations from French and English corpora confirmed the relevance of our system.
Fichier principal
Vignette du fichier
MR_Webist2011.pdf (146.44 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

lirmm-00588581 , version 1 (08-05-2011)

Identifiants

  • HAL Id : lirmm-00588581 , version 1

Citer

Mathieu Roche. How Statistical Information from the Web can Help Identify Named Entities. WEBIST'11: Web Information Systems and Technologies - Web and Text Mining Session, Netherlands. pp.685-689. ⟨lirmm-00588581⟩
143 Consultations
904 Téléchargements

Partager

More