A. Collaboration,

A. Hadoop,

A. Hbase,

, Azure Speed Test

, Chameleon

. Futuregrid,

. Giraffa,

G. ,

. Lustre-opensfs,

, USGS ANSS -Advanced National Seismic System

S. R. Alam, H. N. El-harake, K. Howard, N. Stringfellow, F. Verzelloni et al., Parallel I/O and the metadata wall Characterization of scientific workflows, Workshop on Parallel Data Storage (PDSW) Workshop on WFs in Support of Large-Scale Science, pp.13-18, 2008.

S. A. Brandt, E. L. Miller, D. D. Long, and L. Xue, Efficient metadata management in large distributed storage systems, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings., pp.290-298, 2003.
DOI : 10.1109/MASS.2003.1194865

P. F. Corbett and D. G. Feitelson, The Vesta parallel file system, ACM Transactions on Computer Systems, vol.14, issue.3, pp.225-264, 1996.
DOI : 10.1145/233557.233558

E. Deelman, S. Callaghan, E. Field, H. Francoeur, R. Graves et al., Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), pp.14-14, 2006.
DOI : 10.1109/E-SCIENCE.2006.261098

E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, The cost of doing science on the cloud: The Montage example, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-12, 2008.
DOI : 10.1109/SC.2008.5217932

E. Deelman, G. Singh, M. Su, J. Blythe, Y. Gil et al., Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Scientific Programming, pp.219-237, 2005.
DOI : 10.1155/2005/128026

J. Dias, E. Ogasawara, D. De-oliveira, F. Porto, P. Valduriez et al., Algebraic dataflows for big data analysis, 2013 IEEE International Conference on Big Data, pp.150-155, 2013.
DOI : 10.1109/BigData.2013.6691567

A. Gehani, M. Kim, and T. Malik, Efficient querying of distributed provenance stores, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pp.613-621, 2010.
DOI : 10.1145/1851476.1851567

S. Ghemawat, H. Gobioff, and S. Leung, The Google file system, ACM SIGOPS Operating Systems Review, vol.37, issue.5, pp.29-43, 2003.
DOI : 10.1145/1165389.945450

J. Hsieh, T. Kuo, and L. Chang, Efficient identification of hot data for flash memory storage systems, ACM Transactions on Storage, vol.2, issue.1, pp.22-40, 2006.
DOI : 10.1145/1138041.1138043

S. Jin and A. Bestavros, GreedyDual??? Web caching algorithm: exploiting the two sources of temporal locality in Web request streams, Computer Communications, vol.24, issue.2, pp.174-183, 2001.
DOI : 10.1016/S0140-3664(00)00312-1

G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta et al., Characterizing and profiling scientific workflows, Future Generation Computer Systems, vol.29, issue.3, pp.682-692, 2013.
DOI : 10.1016/j.future.2012.08.015

A. W. Leung, M. Shao, T. Bisson, S. Pasupathy, and E. L. Miller, Spyglass: Fast, scalable metadata search for large-scale storage systems, USENIX Conf. on File and Storage Technologies (FAST), pp.153-166, 2009.

J. J. Levandoski, P. Larson, and R. Stoica, Identifying hot and cold data in main-memory databases, 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp.26-37, 2013.
DOI : 10.1109/ICDE.2013.6544811

J. Liu, E. Pacitti, P. Valduriez, and M. Mattoso, A Survey of Data-Intensive Scientific Workflow Management, Journal of Grid Computing, vol.1, issue.Webserver-Issue, pp.457-493, 2015.
DOI : 10.1109/SERVICES-1.2008.79

URL : https://hal.archives-ouvertes.fr/lirmm-01144760

J. Liu, E. Pacitti, P. Valduriez, and M. , Mat- toso. Scientific workflow scheduling with provenance data in a multisite cloud, Transactions on Large-Scale Data-and Knowledge- Centered Systems (TLDKS), pp.80-112, 2016.

J. Liu, E. Pacitti, P. Valduriez, D. D. Oliveira, and M. Mattoso, Multi-objective scheduling of Scientific Workflows in multisite clouds, Future Generation Computer Systems, vol.63, pp.76-95, 2016.
DOI : 10.1016/j.future.2016.04.014

URL : https://hal.archives-ouvertes.fr/lirmm-01342203

T. Malik, L. Nistor, and A. Gehani, Tracking and sketching distributed data provenance ARC: A selftuning , low overhead replacement cache, Int. Conf. on e-Science USENIX Conf. on File and Storage Technologies (FAST), pp.190-197, 2003.

L. Ethan, R. H. Miller, and . Katz, RAMA: An easy-to-use, high-performance parallel file system, Parallel Computing, vol.23, issue.4, pp.419-446

E. Ogasawara, J. Dias, F. Porto, P. Valduriez, and M. Mattoso, An algebraic approach for data-centric scientific workflows, Proceedings of the VLDB Endowment (PVLDB), pp.1328-1339, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00640431

E. S. Ogasawara, J. Dias, V. Silva, F. S. Chirigati, D. De-oliveira et al., Chiron: a parallel engine for algebraic scientific workflows, Concurrency and Computation: Practice and Experience, pp.252327-2341, 2013.
DOI : 10.1109/eScience.2008.62

URL : https://hal.archives-ouvertes.fr/lirmm-00806557

M. T. Ozsu and P. Valduriez, Principles of Distributed Database Systems, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00483354

L. Pineda-morales, A. Costan, and G. Antoniu, Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows, 2015 IEEE International Conference on Cluster Computing, pp.294-303, 2015.
DOI : 10.1109/CLUSTER.2015.49

URL : https://hal.archives-ouvertes.fr/hal-01239150

L. Pineda-morales, J. Liu, A. Costan, E. Pacitti, G. Antoniu et al., Managing hot metadata for scientific workflows on multisite clouds, 2016 IEEE International Conference on Big Data (Big Data), pp.390-397, 2016.
DOI : 10.1109/BigData.2016.7840628

URL : https://hal.archives-ouvertes.fr/hal-01395715

D. Saha, A. Samanta, and S. R. Sarangi, Theoretical Framework for Eliminating Redundancy in Workflows, 2009 IEEE International Conference on Services Computing, pp.41-48, 2009.
DOI : 10.1109/SCC.2009.19

F. Schmuck and R. Haskin, GPFS: A shareddisk file system for large computing clusters, USENIX Conf. on File and Storage Technologies (FAST), pp.231-244, 2002.

M. Stonebraker and U. Cetintemel, "One Size Fits All": An Idea Whose Time Has Come and Gone, 21st International Conference on Data Engineering (ICDE'05), pp.2-11, 2005.
DOI : 10.1109/ICDE.2005.1

M. Stonebraker, S. Madden, D. J. Abadi, and N. Hachem, The end of an architectural era: Time for a complete rewrite, Int. Conf. on Very Large Data Bases (VLDB), pp.1150-1160

A. Thomson and D. J. Abadi, CalvinFS: consistent wan replication and scalable metadata management for distributed file systems, USENIX Conf. on File and Storage Technologies (FAST), pp.1-14, 2015.

J. Wang, S. Wu, H. Gao, J. Li, and B. C. Ooi, Indexing multi-dimensional data in a cloud system, Proceedings of the 2010 international conference on Management of data, SIGMOD '10, pp.591-602, 2010.
DOI : 10.1145/1807167.1807232

J. M. Wozniak, T. G. Armstrong, M. Wilde, D. S. Katz, E. L. Lusk et al., Swift/t: scalable data flow programming for many-task applications, ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pp.309-310, 2013.

S. Wu, D. Jiang, B. C. Ooi, and K. Wu, Efficient B-tree based indexing for cloud data processing, Proceedings of the VLDB Endowment (PVLDB), pp.1207-1218, 2010.
DOI : 10.14778/1920841.1920991

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: Cluster computing with working sets, USENIX Workshop on Hot Topics in Cloud Computing (Hot- Cloud), pp.10-10, 2010.

M. J. Zaki, Spade: An efficient algorithm for mining frequent sequences, Machine Learning, pp.31-60

D. Zhao, C. Shou, T. Maliky, and I. Raicu, Distributed data provenance for large-scale data-intensive computing, 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp.1-8, 2013.
DOI : 10.1109/CLUSTER.2013.6702685

D. Zhao, Z. Zhang, X. Zhou, T. Li, K. Wang et al., FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems, 2014 IEEE International Conference on Big Data (Big Data), pp.61-70, 2014.
DOI : 10.1109/BigData.2014.7004214