Recommandation Diversifiée et Distribuée pour les Données Scientifiques

Maximilien Servajean 1, 2
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Recommendation is becoming a popular mechanism to help users find relevant information in large-scale data (scientific data, web). Different diversification techniques have been proposed to avoid redundancy in the process of recommendation. Intuitively, the goal of recommendation diversification is to identify a list of items that are dissimilar, but nonetheless relevant to the user's interests. In the first part of this thesis, the main goal was to define a new diversified search and recommendation solution suited for scientific data (i.e. plant phenotyping, botanical data). We first proposed an original profile diversification scoring function that enables to address the problem of returning redundant items, and enhances the quality of diversification compared to the state-of-the-art solutions. We believe our work is the first to investigate profile diversity to address the problem of returning highly popular but too-focused items.Through experimental evaluation using two benchmarks we showed that our scoring function presents the best compromise between diversity and relevancy. Next, to implement our new scoring function we proposed a Top-k threshold-based algorithm that exploits a candidate list to achieve diversification. However this algorithm is greedy and does not scale up well.To overcome this limitation, we proposed several techniques to improve performance. First, we simplified the scoring model to reduce its computational complexity. Second, we proposed two techniques to reduce the number of items in the candidate list, and therefore the number of diversified scores to compute. Third, we proposed different indexing scores (i.e. the score used to sort the items in the inverted lists) that take into account the diversification of items, and using them, we developed an adaptive indexing approach to reduce the number of accesses in the index dynamically based on the queries workload. We evaluated the performance of our techniques through experimentation. The results show that they enable to reduce the response time up to 12 times compared to a baseline greedy diversification algorithm. In the second part of the thesis, we addressed the problem of distributed and diversified recommendation (P2P and multi-site) that fits very well in different application scenarios. We proposed a new scoring function (usefulness) to cluster relevant users over a distributed overlay. We analyzed the new clustering algorithm in details, and we studied its behavior with an experimental evaluation using different datasets. Compared with state-of-the-art solutions, we obtain major gains in recall (order of 3 times).
Document type :
Theses
Complete list of metadatas

Cited literature [94 references]  Display  Hide  Download

https://hal-lirmm.ccsd.cnrs.fr/tel-01098191
Contributor : Maximilien Servajean <>
Submitted on : Tuesday, December 23, 2014 - 11:32:30 AM
Last modification on : Friday, March 15, 2019 - 1:15:00 AM
Long-term archiving on: Tuesday, March 24, 2015 - 10:16:07 AM

Identifiers

  • HAL Id : tel-01098191, version 1

Collections

Citation

Maximilien Servajean. Recommandation Diversifiée et Distribuée pour les Données Scientifiques. Recherche d'information [cs.IR]. Université Montpellier 2, 2014. Français. ⟨tel-01098191⟩

Share

Metrics

Record views

1077

Files downloads

1184