How Statistical Information from the Web can Help Identify Named Entities

Mathieu Roche

Communication Dans Un Congrès Année : 2011

How Statistical Information from the Web can Help Identify Named Entities

(1)

Mathieu Roche

Fonction : Auteur
PersonId : 4967
IdHAL : mathieu-roche
ORCID : 0000-0003-3272-8568
IdRef : 09042087X

Exploration et exploitation de données textuelles

Résumé

This paper presents a Natural Language Processing (NLP) approach to filter Named Entities (NE) from a list of collocation candidates. The NE are defined as the names of 'People', 'Places', 'Organizations', 'Software', 'Illnesses', and so forth. The proposed method is based on statistical measures associated with Web resources to identify NE. Our method has three stages: (1) Building artificial prepositional collocations from Noun-Noun candidates; (2) Measuring the "relevance" of the resulting prepositional collocations using statistical methods (Web Mining); (3) Selecting prepositional collocations. The evaluation of Noun-Noun collocations from French and English corpora confirmed the relevance of our system.

Mots clés

Text-Mining Natural Language Processing Terminology Named Entity

Domaines

Autre Traitement du texte et du document Recherche d'information [cs.IR] Web

Fichier principal

MR_Webist2011.pdf (146.44 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Mathieu Roche : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00588581

Soumis le : dimanche 8 mai 2011-19:14:09

Dernière modification le : samedi 15 juillet 2023-04:09:50

Archivage à long terme le : vendredi 9 novembre 2012-10:55:38

Dates et versions

lirmm-00588581 , version 1 (08-05-2011)

Identifiants

HAL Id : lirmm-00588581 , version 1

Citer

Mathieu Roche. How Statistical Information from the Web can Help Identify Named Entities. WEBIST'11: Web Information Systems and Technologies - Web and Text Mining Session, Netherlands. pp.685-689. ⟨lirmm-00588581⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS TEXTE LIRMM MIPS UNIV-MONTPELLIER

143 Consultations

904 Téléchargements

How Statistical Information from the Web can Help Identify Named Entities

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager