Skip to Main content Skip to Navigation
Journal articles

Handling Missing Values for Mining Gradual Patterns from NoSQL Graph Databases

Faaiz Shah 1 Arnaud Castelltort 1 Anne Laurent 1
1 FADO - Fuzziness, Alignments, Data & Ontologies
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Graph databases (NoSQL oriented graph databases) provide the ability to manage highly connected data and complex database queries along with the native graph-storage and processing. A property graph in a NoSQL graph engine is a labeled directed graph composed of nodes connected through edges with a set of attributes or properties in the form of (key : value) pairs. It facilitates to represent the data and knowledge that are in form of graphs. Practical applications of graph database systems have been seen in social networks, recommendation systems, fraud detection, and data journalism, as in the case for panama papers. Often, we face the issue of missing data in such kind of systems. In particular, these semi-structured NoSQL databases lead to a situation where some attributes (properties) are filled-in while other ones are not available, either because they exist but are missing (for instance the age of a person that is unknown) or because they are not applicable for a particular case (for instance the year of military service for a girl in countries where it is mandatory only for boys). Therefore, some keys can be provided for some nodes and not for other ones. In such a scenario, when we want to extract knowledge from these new generation database systems, we face the problem of missing data that arises need for analyzing them. Some approaches have been proposed to replace missing values so as to be able to apply data mining techniques. However, we argue that it is not relevant to consider such approaches because they may introduce biases or errors. In our work, we focus on the extraction of gradual patterns from property graphs that provide end-users with tools for mining correlations in the data when there exist missing values. Our approach requires first to define gradual patterns in the context of NoSQL property graph and then to extend existing algorithms so as to treat the missing values, because anti-monotonicity of the support can not be considered anymore in a simple manner. Thus, we introduce a novel approach for mining gradual patterns in the presence of missing values and we test it on real and synthetic data.
Document type :
Journal articles
Complete list of metadatas

Cited literature [48 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/lirmm-02372243
Contributor : Anne Laurent <>
Submitted on : Wednesday, November 20, 2019 - 12:09:08 PM
Last modification on : Monday, May 4, 2020 - 10:06:04 AM

File

FGCS_PostPrint.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Faaiz Shah, Arnaud Castelltort, Anne Laurent. Handling Missing Values for Mining Gradual Patterns from NoSQL Graph Databases. Future Generation Computer Systems, Elsevier, In press, ⟨10.1016/j.future.2019.10.004⟩. ⟨lirmm-02372243⟩

Share

Metrics

Record views

76

Files downloads

114