. Amazon,

. Grid5000,

, The clueweb09 dataset, 2009.

. English, , 2014.

D. Achlioptas, Database-friendly random projections: Johnson-lindenstrauss with binary coins, Journal of Computer and System Sciences, vol.66, issue.4, pp.671-687, 2003.

R. Agrawal, C. Faloutsos, and A. N. Swami, Efficient similarity search in sequence databases, Proceedings of the International Conference on Foundations of Data Organization and Algorithms (FODO), pp.69-84, 1993.

R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases, Proceedings of the International Conference on Very Large Data Bases (VLDB), pp.487-499, 1994.

R. Anand, Mining of massive datasets, 2012.

I. Assent, R. Krieger, F. Afschari, and T. Seidl, The ts-tree: Efficient time series search and retrieval, Proceedings of the International Conference on Extending Database Technology (EDBT), pp.252-263, 2008.

I. Assent, R. Krieger, F. Afschari, and T. Seidl, The ts-tree: efficient time series search and retrieval, Proceedings of the International Conference on Extending Database Technology (EDBT), pp.252-263, 2008.

K. Berberich and S. Bedathur, Computing n-gram statistics in mapreduce, Proceedings of the International Conference on Extending Database Technology (EDBT), p.71, 2013.

M. Berry, Survey of text mining II clustering, classification, and retrieval, 2008.

C. Bizer, P. A. Boncz, M. L. Brodie, and O. Erling, The meaningful use of big data: four perspectives -four challenges, SIGMOD Rec, vol.40, issue.4, pp.56-60, 2011.

S. Brin, R. Motwani, and C. Silverstein, Beyond market baskets: Generalizing association rules to correlations, SIGMOD Rec, vol.26, issue.2, pp.265-276, 1997.

Y. Cai and R. Ng, Indexing spatio-temporal trajectories with chebyshev polynomials, Proceedings of the International Conference on Management of Data (SIGMOD), pp.599-610, 2004.

A. Camerra, T. Palpanas, J. Shieh, and E. Keogh, iSAX 2.0: Indexing and mining one billion time series, Proceedings of the International Conference on Data Mining (ICDM), pp.58-67, 2010.

A. Camerra, J. Shieh, T. Palpanas, T. Rakthanmanon, and E. J. Keogh, Beyond one billion time series: indexing and mining very large time series collections with iSAX2+, Knowledge and Information Systems (KAIS), vol.39, pp.123-151, 2014.

A. Camerra, J. Shieh, T. Palpanas, T. Rakthanmanon, and E. J. Keogh, Beyond one billion time series: indexing and mining very large time series collections with iSAX2+, Knowledge and Information Systems (KAIS), vol.39, issue.1, pp.123-151, 2014.

K. Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani, Locally adaptive dimensionality reduction for indexing large time series databases, ACM Transactions on Database Systems (TODS), vol.27, issue.2, pp.188-228, 2002.

A. Kin-pong-chan and . Fu, Efficient time series matching by wavelets, Proceedings of the International Conference on Data Engineering (ICDE), pp.126-133, 1999.

G. Chandrashekar and F. Sahin, A survey on feature selection methods, Computers and Electrical Engineering, vol.40, issue.1, pp.16-28, 2014.

M. S. Charikar, Similarity estimation techniques from rounding algorithms, Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing (STOC), pp.380-388, 2002.

R. Cole, D. Shasha, and X. Zhao, Fast window correlations over uncooperative time series, Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp.743-749, 2005.

C. Coles and J. Yeoh, Cloud adoption practices and priorities survey report, 2015.

T. M. Cover, Elements of information theory, 2006.

J. Dean and S. Ghemawat, Mapreduce: simplified data processing on large clusters, Commun. ACM, vol.51, issue.1, pp.107-113, 2008.

C. Dwork, Differential privacy, International Colloquium on Automata, Languages and Programming (ICALP), pp.1-12, 2006.

P. Esling and C. Agon, Time-series data mining, ACM Comput. Surv, vol.45, issue.1, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01577883

C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, Fast subsequence matching in time-series databases, SIGMOD Rec, vol.23, issue.2, pp.419-429, 1994.

C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, Fast subsequence matching in time-series databases, Proceedings of the International Conference on Management of Data (SIGMOD), pp.419-429, 1994.

F. Geerts, B. Goethals, and T. Mielikäinen, Tiling databases, International Conference on Discovery Science, pp.278-289, 2004.

Z. Ghahramani, Unsupervised learning, Advanced Lectures on Machine Learning, pp.72-112, 2004.

C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, Mining frequent patterns in data streams at multiple time granularities, 2002.

A. Gionis, H. Mannila, and J. K. Seppänen, Geometric and combinatorial tiles in 0-1 data, Knowledge Discovery in Databases (PKDD), pp.173-184, 2004.

A. Gionis, P. Indyk, and R. Motwani, Similarity search in high dimensions via hashing, Proceedings of the International Conference on Very Large Data Bases (VLDB), pp.518-529, 1999.

R. Gray, Entropy and information theory, 2011.

E. Greengrass, Information retrieval: A survey, 2000.

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res, vol.3, pp.1157-1182, 2003.

P. Han and Y. , Mining frequent patterns without candidate generation, SIGMOD Rec, vol.29, 2000.

J. Han, Data mining : concepts and techniques, 2012.

H. Heikinheimo, E. Hinkkanen, H. Mannila, T. Mielikäinen, and J. K. Seppänen, Finding low-entropy sets and trees from binary data, Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp.350-359, 2007.

A. Henelius, I. Karlsson, P. Papapetrou, A. Ukkonen, and K. Puolamäki, Semigeometric tiling of event sequences, Machine Learning and Knowledge Discovery in Databases. ECML PKDD, pp.329-344, 2016.

, Bibliography -Part, vol.1

F. Herrera, C. Carmona, P. González, M. Del, and J. , An overview on subgroup discovery: foundations and applications, Knowledge and Information Systems, vol.29, issue.3, pp.495-525, 2011.

P. Indyk, Stable distributions, pseudorandom generators, embeddings and data stream computation, 41st Annual Symposium on Foundations of Computer Science (FOCS), pp.189-197, 2000.

H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg, Searching in one billion vectors: re-rank with source coding, ICASSP, 2011.

W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, Conference in Modern Analysis and Probability, vol.26, pp.189-206, 1984.

J. Eamonn and . Keogh, Exact indexing of dynamic time warping, Proceedings of the International Conference on Very Large Data Bases (VLDB), 2002.

J. Eamonn, K. Keogh, M. J. Chakrabarti, S. Pazzani, and . Mehrotra, Dimensionality reduction for fast similarity search in large time series databases, Knowledge and Information Systems (KAIS), vol.3, issue.3, pp.263-286, 2001.

J. Arno, E. K. Knobbe, and . Ho, Maximally informative k-itemsets and their efficient discovery, Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp.237-244, 2006.

S. B. Kotsiantis, Supervised machine learning: A review of classification techniques, Proceedings of International Conference on Emerging Artificial Intelligence Applications in Computer Engineering, pp.3-24, 2007.

E. Kushilevitz, R. Ostrovsky, and Y. Rabani, Efficient search for approximate nearest neighbor in high dimensional spaces, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (STOC), pp.614-623, 1998.

H. Li, Y. Wang, D. Zhang, M. Zhang, and E. Y. Chang, Pfp: parallel fp-growth for query recommendation, Proceedings of the ACM Conf. on Recommender Systems (RecSys), pp.107-114, 2008.

J. Lin, E. Keogh, S. Lonardi, and B. Chiu, A symbolic representation of time series, with implications for streaming algorithms, Proceedings of the International Conference on Management of Data (SIGMOD), 2003.

J. Lin, E. Keogh, L. Wei, and S. Lonardi, Experiencing sax: A novel symbolic representation of time series, Data Min. Knowl. Discov, vol.15, issue.2, pp.107-144, 2007.

Y. Matsubara and Y. Sakurai, Regime shifts in streams: Real-time forecasting of co-evolving time sequences, Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp.1045-1054, 2016.

I. Miliaraki, K. Berberich, R. Gemulla, and S. Zoupanos, Mind the gap: Large-scale frequent sequence mining, Proceedings of International Conference on Management of Data (SIGMOD), pp.797-808, 2013.

S. Moens, E. Aksehirli, and B. Goethals, Frequent itemset mining for big data, IEEE International Conference on Big Data, pp.111-118, 2013.

A. Mueen, S. Nath, and J. Liu, Fast approximate correlation for massive timeseries data, Proceedings of the International Conference on Management of Data (SIG-MOD), pp.171-182, 2010.

T. Palpanas, Data series management: The road to big sequence analytics, SIGMOD Record, vol.44, issue.2, pp.47-52, 2015.

T. Palpanas, Big sequence management: A glimpse of the past, the present, and the future, SOFSEM, 2016.

S. Papadimitriou, J. Sun, and C. Faloutsos, Streaming pattern discovery in multiple time-series, Proceedings of the International Conference on Very Large Data Bases (VLDB), pp.697-708, 2005.

S. Papadimitriou and P. S. Yu, Optimal multi-scale patterns in time series streams, Proceedings of the International Conference on Management of Data (SIGMOD), pp.647-658, 2006.

C. Perng, H. Wang, and S. Ma, Fast relevance discovery in time series, Proceedings of the International Conference on Data Mining (ICDM), pp.1016-1020, 2006.

T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover et al., Searching and mining trillions of time series subsequences under dynamic time warping, Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2012.

M. Riondato, J. A. Debrabant, R. Fonseca, and E. Upfal, Parma: a parallel randomized algorithm for approximate association rules mining in mapreduce, Proceedings of the International Conference on Information and Knowledge Management (CIKM), pp.85-94, 2012.

Y. Sakurai, C. Faloutsos, and M. Yamamuro, Stream monitoring under the time warping distance, Proceedings of the International Conference on Data Engineering (ICDE), pp.1046-1055, 2007.

A. Savasere, E. Omiecinski, and S. B. Navathe, An efficient algorithm for mining association rules in large databases, Proceedings of the International Conference on Very Large Data Bases (VLDB), pp.432-444, 1995.

D. Shasha and Y. Zhu, High Performance Discovery in Time series, Techniques and Case Studies, 2004.

, Bibliography -Part, vol.1

J. Shieh and E. Keogh, isax: Indexing and mining terabyte sized time series, Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp.623-631, 2008.

J. Shieh and E. Keogh, isax: Disk-aware mining and indexing of massive time series datasets, Data Min. Knowl. Discov, vol.19, issue.1, pp.24-57, 2009.

J. Shieh and E. Keogh, iSAX: Indexing and mining terabyte sized time series, Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp.623-631, 2008.

S. K. Tanbeer, C. F. Ahmed, and B. Jeong, Parallel and distributed frequent pattern mining in large databases, Proceedings of the IEEE International Conference on High Performance Computing and Communications (HPCC), pp.407-414, 2009.

N. Tatti, Probably the best itemsets, Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp.293-302, 2010.

W. Teng, M. Chen, and P. S. Yu, A regression-based temporal pattern mining scheme for data streams, Proceedings of the International Conference on Very Large Data Bases (VLDB), pp.93-104, 2003.

W. Yang, W. Peng, P. Jian, W. Wei, and H. Sheng, A data-adaptive and dynamic segmentation index for whole matching on time series, vol.6, pp.793-804, 2013.

T. White, Hadoop : the definitive guide, 2012.

Q. Xie, S. Shang, B. Yuan, C. Pang, and X. Zhang, Local correlation detection with linearity enhancement in streaming data, Proceedings of the International Conference on Information and Knowledge Management (CIKM), pp.309-318, 2013.

C. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding et al., Matrix profile I: all pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets, Proceedings of the International Conference on Data Mining (ICDM), pp.1317-1322, 2016.

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: Cluster computing with working sets, Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp.10-10, 2010.

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: Cluster computing with working sets, Proceedings of the 2Nd USENIX Conf. on Hot Topics in Cloud Computing, pp.10-10, 2010.

C. Zhang and F. Masseglia, Discovering highly informative feature sets from data streams, Database and Expert Systems Applications, pp.91-104, 2010.

K. Zoumpatianos, T. Idreos, and . Palpanas, Indexing for interactive exploration of big data series, Proceedings of the International Conference on Management of Data (SIGMOD), pp.1555-1566, 2014.

K. Zoumpatianos, ADS: the adaptive data series index, Stratos Idreos, and Themis Palpanas, vol.25, pp.843-866, 2016.

R. Djamel-edine-yagoubi, F. Akbarinia, T. Masseglia, and . Palpanas, Massively distributed time series indexing and querying, IEEE Transactions on Knowledge and Data Engineering (TKDE), 2019.

O. Levchenko, R. Djamel-edine-yagoubi, F. Akbarinia, B. Masseglia, D. E. Kolev et al., Spark-parsketch: A massively distributed indexing of time series datasets, Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp.1951-1954, 2018.
URL : https://hal.archives-ouvertes.fr/lirmm-01886760

R. Djamel-edine-yagoubi, B. Akbarinia, O. Kolev, F. Levchenko, P. Masseglia et al., Parcorr: efficient parallel methods to identify similar time series pairs across sliding windows, Data Mining and Knowledge Discovery (DMKD), vol.32, issue.5, pp.1481-1507, 2018.

R. Djamel-edine-yagoubi, F. Akbarinia, T. Masseglia, and . Palpanas, Dpisax: Massively distributed partitioned isax, Proceedings of the International Conference on Data Mining (ICDM), pp.1135-1140, 2017.

S. Salah, R. Akbarinia, and F. Masseglia, A highly scalable parallel algorithm for maximally informative k-itemset mining, Knowledge and Information Systems (KAIS), vol.50, issue.1, pp.1-26, 2017.
URL : https://hal.archives-ouvertes.fr/lirmm-01288571

S. Salah, R. Akbarinia, and F. Masseglia, Fast parallel mining of maximally informative k-itemsets in big data, Proceedings of the International Conference on Data Mining (ICDM), pp.359-368, 2015.
URL : https://hal.archives-ouvertes.fr/lirmm-01187275

S. Salah, R. Akbarinia, and F. Masseglia, Data placement in massively distributed environments for fast parallel mining of frequent itemsets. Knowledge and Information Systems (KAIS), vol.53, pp.207-237, 2017.
URL : https://hal.archives-ouvertes.fr/lirmm-01620383

C. Sahin, T. Allard, R. Akbarinia, A. E. Abbadi, and E. Pacitti, A differentially private index for range query processing in clouds, Proceedings of IEEE International Conference on Data Engineering (ICDE), pp.208-216, 2018.
URL : https://hal.archives-ouvertes.fr/lirmm-01886725

S. Mahboubi, R. Akbarinia, and P. Valduriez, Privacy-preserving top-k query processing in distributed systems, Proceedings of the International European Conference on Parallel and Distributed Computing, pp.281-292, 2018.
URL : https://hal.archives-ouvertes.fr/lirmm-01886160

S. Mahboubi, R. Akbarinia, and P. Valduriez, Answering top-k queries over outsourced sensitive data in the cloud, Proceedings of the International Conference on Database and Expert Systems Applications (DEXA), pp.218-231, 2018.
URL : https://hal.archives-ouvertes.fr/lirmm-01886164

M. Liroz-gistau, R. Akbarinia, D. Agrawal, and P. Valduriez, Fphadoop: Efficient processing of skewed mapreduce jobs, Information Systems, vol.60, pp.69-84, 2016.
URL : https://hal.archives-ouvertes.fr/lirmm-01377715

M. Liroz-gistau, R. Akbarinia, and P. Valduriez, Fp-hadoop: Efficient execution of parallel jobs over skewed data, Proceedings of the VLDB Endowment (PVLDB), vol.8, pp.1856-1859, 2015.
URL : https://hal.archives-ouvertes.fr/lirmm-01162362

M. Liroz-gistau, R. Akbarinia, E. Pacitti, F. Porto, and P. Valduriez, Dynamic workload-based partitioning algorithms for continuously growing databases, Trans. Large-Scale Data-and Knowledge-Centered Systems, vol.12, pp.105-128, 2013.
URL : https://hal.archives-ouvertes.fr/lirmm-00906966

M. Servajean, R. Akbarinia, E. Pacitti, and S. Amer-yahia, Profile diversity for query processing using user recommendations, Information Systems, vol.48, pp.44-63, 2015.
URL : https://hal.archives-ouvertes.fr/lirmm-01079523

R. Akbarinia, E. Pacitti, and P. Valduriez, Best position algorithms for efficient top-k query processing, Information Systems, vol.36, issue.6, pp.973-989, 2011.
URL : https://hal.archives-ouvertes.fr/lirmm-00607882

P. William-kokou-dedzoe, R. Lamarre, P. Akbarinia, and . Valduriez, ASAP top-k query processing in unstructured P2P systems, Proceedings of the IEEE International Conference on Peer-to-Peer Computing (P2P), pp.1-10, 2010.

M. Tlili, W. Kokou-dedzoe, E. Pacitti, P. Valduriez, R. Akbarinia et al., Gérôme Canals, and Stéphane Laurière. P2P logging and timestamping for reconciliation, Proceedings of the VLDB Endowment (PVLDB), vol.1, pp.1420-1423, 2008.

R. Akbarinia, M. Tlili, E. Pacitti, P. Valduriez, and A. A. Lima, Replication in dhts using dynamic groups, Trans. Large-Scale Data-and KnowledgeCentered Systems, vol.3, pp.1-19, 2011.
URL : https://hal.archives-ouvertes.fr/lirmm-00607915

N. Ayat, R. Akbarinia, H. Afsarmanesh, and P. Valduriez, Entity resolution for probabilistic data, Information Sciences, vol.277, pp.492-511, 2014.
URL : https://hal.archives-ouvertes.fr/lirmm-00879631

N. Ayat, R. Akbarinia, H. Afsarmanesh, and P. Valduriez, Entity resolution for distributed probabilistic data. Distributed and Parallel Databases (DAPD), vol.31, pp.509-542, 2013.

R. Akbarinia and F. Masseglia, Fast and exact mining of probabilistic data streams, Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (PKDD), pp.493-508, 2013.
URL : https://hal.archives-ouvertes.fr/lirmm-00838618

R. Akbarinia, P. Valduriez, and G. Verger, Efficient evaluation of SUM queries over probabilistic data, IEEE Transactions on Knowledge and Data Engineering (TKDE), vol.25, issue.4, pp.764-775, 2013.
URL : https://hal.archives-ouvertes.fr/lirmm-00652293

R. Akbarinia, E. Pacitti, and P. Valduriez, Best position algorithms for top-k queries, Proceedings of the International Conference on Very Large Data Bases (VLDB), pp.495-506, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00378836

R. Akbarinia, E. Pacitti, and P. Valduriez, Processing top-k queries in distributed hash tables, Proceedings of the International European Conference on Parallel and Distributed Computing, pp.489-502, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00378864

R. Akbarinia, E. Pacitti, and P. Valduriez, Reducing network traffic in unstructured P2P systems using top-k queries. Distributed and Parallel Databases (DAPD), vol.19, pp.67-86, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00416447

P. William-kokou-dedzoe, R. Lamarre, P. Akbarinia, and . Valduriez, Assoon-as-possible top-k query processing in P2P systems, Trans. Large-Scale Data-and Knowledge-Centered Systems, vol.9, pp.1-27, 2013.

W. Palma, R. Akbarinia, E. Pacitti, and P. Valduriez, Dhtjoin: processing continuous join queries using DHT networks. Distributed and Parallel Databases (DAPD), vol.26, pp.291-317, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00410473

W. Palma, R. Akbarinia, E. Pacitti, and P. Valduriez, Distributed processing of continuous join queries using DHT networks, Proceedings of the EDBT/ICDT Workshops, pp.34-41, 2009.

W. Palma, R. Akbarinia, E. Pacitti, and P. Valduriez, Efficient processing of continuous join queries using distributed hash tables, Proceedings of the International European Conference on Parallel and Distributed Computing, pp.632-641, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00368874

R. Akbarinia, E. Pacitti, and P. Valduriez, Data currency in replicated dhts, Proceedings of International Conference on Management of Data (SIGMOD), pp.211-222, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00378860