Toward Geographic Information Harvesting: Extraction of Spatial Relational Facts from Web Documents

Corrado Loglisci 1, 2 Dino Ienco 2, 3 Mathieu Roche 4 Maguelonne Teisseire 3, 2 Donato Malerba 1
3 TATOO - Fouille de données environnementales
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
4 TEXTE - Exploration et exploitation de données textuelles
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : This paper faces the problem of harvesting geographic information from Web documents, specifically, extracting facts on spatial relations among geographic places. The motivation is twofold. First, researchers on Spatial Data Mining often assume that spatial data are already available, thanks to current GIS and positioning technologies. Nevertheless, this is not applicable to the case of spatial information embedded in data without an explicit spatial modeling, such as documents. Second, despite the huge amount of Web documents conveying useful geographic information, there is not much work on how to harvest spatial data from these documents. The problem is particularly challenging because of the lack of annotated documents, which prevents the application of supervised learning techniques. In this paper, we propose to harvest facts on geographic places through an unsupervised approach which recognizes spatial relations among geographic places without supposing the availability of annotated documents. The proposed approach is based on the combined use of a spatial ontology and a prototype-based classifier. A case study on topological and directional relations is reported and commented.
Type de document :
Communication dans un congrès
SSTDM: Spatial and Spatio-Temporal Data Mining, Dec 2012, Brussels, Belgium. International Workshop on Spatial and Spatiotemporal Data Mining (SSTDM-12), In Cooperation with IEEE ICDM 2012, 10 December 2012, Brussels, Belgium, pp.789-796, 2012
Liste complète des métadonnées

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00816292
Contributeur : Mathieu Roche <>
Soumis le : dimanche 21 avril 2013 - 13:56:11
Dernière modification le : jeudi 24 mai 2018 - 15:59:23

Identifiants

  • HAL Id : lirmm-00816292, version 1

Citation

Corrado Loglisci, Dino Ienco, Mathieu Roche, Maguelonne Teisseire, Donato Malerba. Toward Geographic Information Harvesting: Extraction of Spatial Relational Facts from Web Documents. SSTDM: Spatial and Spatio-Temporal Data Mining, Dec 2012, Brussels, Belgium. International Workshop on Spatial and Spatiotemporal Data Mining (SSTDM-12), In Cooperation with IEEE ICDM 2012, 10 December 2012, Brussels, Belgium, pp.789-796, 2012. 〈lirmm-00816292〉

Partager

Métriques

Consultations de la notice

148