Handling Missing Values for Mining Gradual Patterns from NoSQL Graph Databases - LIRMM - Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier
Journal Articles Future Generation Computer Systems Year : 2020

Handling Missing Values for Mining Gradual Patterns from NoSQL Graph Databases

Abstract

Graph databases (NoSQL oriented graph databases) provide the ability to manage highly connected data and complex database queries along with the native graph-storage and processing. A property graph in a NoSQL graph engine is a labeled directed graph composed of nodes connected through edges with a set of attributes or properties in the form of (key : value) pairs. It facilitates to represent the data and knowledge that are in form of graphs. Practical applications of graph database systems have been seen in social networks, recommendation systems, fraud detection, and data journalism, as in the case for panama papers. Often, we face the issue of missing data in such kind of systems. In particular, these semi-structured NoSQL databases lead to a situation where some attributes (properties) are filled-in while other ones are not available, either because they exist but are missing (for instance the age of a person that is unknown) or because they are not applicable for a particular case (for instance the year of military service for a girl in countries where it is mandatory only for boys). Therefore, some keys can be provided for some nodes and not for other ones. In such a scenario, when we want to extract knowledge from these new generation database systems, we face the problem of missing data that arises need for analyzing them. Some approaches have been proposed to replace missing values so as to be able to apply data mining techniques. However, we argue that it is not relevant to consider such approaches because they may introduce biases or errors. In our work, we focus on the extraction of gradual patterns from property graphs that provide end-users with tools for mining correlations in the data when there exist missing values. Our approach requires first to define gradual patterns in the context of NoSQL property graph and then to extend existing algorithms so as to treat the missing values, because anti-monotonicity of the support can not be considered anymore in a simple manner. Thus, we introduce a novel approach for mining gradual patterns in the presence of missing values and we test it on real and synthetic data.
Fichier principal
Vignette du fichier
FGCS_PostPrint.pdf (1.14 Mo) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

lirmm-02372243 , version 1 (20-11-2019)

Identifiers

Cite

Faaiz Hussain Shah, Arnaud Castelltort, Anne Laurent. Handling Missing Values for Mining Gradual Patterns from NoSQL Graph Databases. Future Generation Computer Systems, 2020, 111, pp.523-538. ⟨10.1016/j.future.2019.10.004⟩. ⟨lirmm-02372243⟩
158 View
387 Download

Altmetric

Share

More