FIN3E Approach: Identification of Named Entities from Extracted Terms
Abstract
The Named Entities (NE) are classically defined as the names of People, Places, and Organizations. Moreover other NE classes as Documents (e.g. software, hardware), and Sciences (e.g. illness, medications) exist. In order to identify NE, a lot of systems rely on the presence of uppercases. This technique can be inefficient to treat non-standard documents (e.g. emails, blogs, fora, texts or fragments of texts totally written in uppercase or lowercase). In this work, we do not use this kind of information to identify the NE. Formally, to characterize the NE, there exists two important criteria: (1) Referential uniqueness (i.e. a proper noun refers to one referential entity), (2) Denominative stability (i.e. little possible variations). Our work is based on this last criterion to identify the NE from Noun-Noun terms obtained by terminology extraction methods. Our method deals with a cognitive process that simulates a human reasoning: (1) Expressing differently one term by a reformulation technique, (2) Judging the relevance of this reformulation to identify NE.