Exploiting Textual Source Information for Epidemiosurveillance - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier Access content directly
Conference Papers Year : 2014

Exploiting Textual Source Information for Epidemiosurveillance


In recent years as a complement to the traditional surveillance reporting systems there is a great interest in developing methodologies for early detection of potential health threats from unstructured text present on the Internet. In this context, we examined the relevance of the combination of expert knowledge and automatic term extraction in the creation of appropriate Internet search queries for the acquisition of disease outbreak news. We propose a measure that is the number of relevant disease outbreak news detected in function of the terms automatically extracted from a set of example Google and PubMED corpora. Due to the recent emergence we have used the African swine fever as a disease example. The new and exotic infectious diseases are an incising threat to countries due to globalization, movement of passengers, and international trade. With the traditional reporting schemes, often there are miss, delays or underreporting of disease outbreaks; leading to unawareness of countries about potential disease threats. As the Internet is a source of numerous and dynamic information, services need tools that could refine the search and detect the information of interest. Two important systems of the state-of-the-art, MediSys (Mantero et al. 2011) and Healthmap (Collier 2012) are based on a series of automatic steps to detect and acquire disease related news. The algorithms rely upon predefined templates, such keywords or patterns. Internet search queries have been proposed as inexpensive method to detect signals of diseases (ex. avian influenza) (Polgreen et al. 2008). In the face of many diseases and even more symptoms, the analysts face another challenge: How to identify appropriate queries for Internet disease surveillance? One option is to use the terms from existing thesaurus (e.g., MeSH). In this paper we present a new combined approach of selection of terms automatically extracted from relevant scientific and non-scientific corpora in order to identify most appropriate search queries for the detection of disease outbreak news on the Internet. As it is a recently emerging disease we use African swine fever (ASF) as a disease example.
Fichier principal
Vignette du fichier
document_574402.pdf (127.45 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

lirmm-01184556 , version 1 (16-08-2015)


  • HAL Id : lirmm-01184556 , version 1


Elena Arsevska, Mathieu Roche, Renaud Lancelot, Pascal Hendrikx, Barbara Dufour. Exploiting Textual Source Information for Epidemiosurveillance. MTSR: Metadata and Semantics Research, Nov 2014, Karlsruhe, Germany. pp.359-361. ⟨lirmm-01184556⟩
453 View
337 Download


Gmail Mastodon Facebook X LinkedIn More