R. Sadaf, H. N. Alam, K. El-harake, N. Howard, F. Stringfellow et al., Parallel I/O and the metadata wall, Proc. of the 6th Workshop on Parallel Data Storage, pp.13-18, 2011.

S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M. Su et al., Characterization of scientific workflows, 2008 Third Workshop on Workflows in Support of Large-Scale Science, 2008.
DOI : 10.1109/WORKS.2008.4723958

S. A. Brandt, E. L. Miller, D. D. Long, and L. Xue, Efficient metadata management in large distributed storage systems, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings., 2003.
DOI : 10.1109/MASS.2003.1194865

F. Peter, D. G. Corbett, and . Feitelson, The vesta parallel file system, ACM Trans. Comput. Syst, vol.14, issue.3, pp.225-264, 1996.

E. Deelman, D. Gannon, M. Shields, and I. Taylor, Workflows and e-Science: An overview of workflow system features and capabilities, Future Generation Computer Systems, vol.25, issue.5, pp.528-540, 2009.
DOI : 10.1016/j.future.2008.06.012

E. Deelman, G. Singh, M. Su, J. Blythe, Y. Gil et al., Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Scientific Programming, pp.219-237, 2005.
DOI : 10.1155/2005/128026

URL : https://doi.org/10.1155/2005/128026

E. Deelman, S. Callaghan, E. Field, H. Francoeur, R. Graves et al., Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), 2006.
DOI : 10.1109/E-SCIENCE.2006.261098

URL : http://www.isi.edu/~deelman/deelman_Ecybershake.pdf

E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, The cost of doing science on the cloud: The Montage example, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-5012, 2008.
DOI : 10.1109/SC.2008.5217932

J. Dias, E. Ogasawara, D. De-oliveira, F. Porto, P. Valduriez et al., Algebraic dataflows for big data analysis, 2013 IEEE International Conference on Big Data, pp.150-155, 2013.
DOI : 10.1109/BigData.2013.6691567

A. Gehani, M. Kim, and T. Malik, Efficient querying of distributed provenance stores, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pp.613-621, 2010.
DOI : 10.1145/1851476.1851567

URL : http://www.csl.sri.com/users/gehani/papers/CLADE-2010.Querying.pdf

S. Ghemawat, H. Gobioff, and S. Leung, The Google file system, ACM SIGOPS Operating Systems Review, vol.37, issue.5, pp.29-43, 2003.
DOI : 10.1145/1165389.945450

W. Andrew, M. Leung, T. Shao, S. Bisson, . Pasupathy et al., Spyglass: Fast, scalable metadata search for large-scale storage systems, In FAST, vol.9, pp.153-166, 2009.

J. Justin, P. Levandoski, R. Larson, and . Stoica, Identifying hot and cold data in main-memory databases, Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pp.26-37, 2013.

J. Liu, E. Pacitti, P. Valduriez, and M. Mattoso, A Survey of Data-Intensive Scientific Workflow Management, Journal of Grid Computing, vol.1, issue.Webserver-Issue, 2015.
DOI : 10.1109/SERVICES-1.2008.79

URL : https://hal.archives-ouvertes.fr/lirmm-01144760

J. Liu, E. Pacitti, P. Valduriez, and M. Mattoso, Scientific workflow scheduling with provenance data in a multisite cloud. Transactions on Large-Scale Data-and Knowledge-Centered Systems, 2016.
URL : https://hal.archives-ouvertes.fr/lirmm-01620224

J. Liu, E. Pacitti, P. Valduriez, and M. Mattoso, Scientific Workflow Scheduling with Provenance Support in Multisite Cloud, High Performance Computing for Computational Science VECPAR, 2016.
DOI : 10.1145/1084805.1084816

URL : https://hal.archives-ouvertes.fr/lirmm-01342190

J. Liu, E. Pacitti, P. Valduriez, D. D. Oliveira, and M. Mattoso, Multi-objective scheduling of Scientific Workflows in multisite clouds, Future Generation Computer Systems, vol.63, pp.76-95, 2016.
DOI : 10.1016/j.future.2016.04.014

URL : https://hal.archives-ouvertes.fr/lirmm-01342203

J. Liu, V. S. Sousa, E. Pacitti, P. Valduriez, and M. Mattoso, Scientific Workflow Partitioning in Multisite Cloud, Euro-Par 2014: Parallel Processing Workshops - Euro-Par 2014 Int. Workshops, pp.105-116, 2014.
DOI : 10.1007/978-3-319-14325-5_10

T. Malik, L. Nistor, and A. Gehani, Tracking and Sketching Distributed Data Provenance, 2010 IEEE Sixth International Conference on e-Science, pp.190-197, 2010.
DOI : 10.1109/eScience.2010.51

N. Megiddo, S. Dharmendra, and . Modha, Arc: A self-tuning, low overhead replacement cache, FAST -USENIX Conference on File and Storage Technologies, pp.115-130, 2003.

L. Ethan, R. H. Miller, and . Katz, RAMA: An easy-to-use, high-performance parallel file system, Parallel Computing, vol.23, issue.4, pp.419-446

E. S. Ogasawara, J. Dias, V. Silva, F. S. Chirigati, D. De-oliveira et al., Chiron: a parallel engine for algebraic scientific workflows, Concurrency and Computation: Practice and Experience, pp.252327-2341, 2013.
DOI : 10.1109/eScience.2008.62

URL : https://hal.archives-ouvertes.fr/lirmm-00806557

E. Ogasawara, J. Dias, F. Porto, P. Valduriez, and M. Mattoso, An algebraic approach for data-centric scientific workflows, Proc. of VLDB Endowment, pp.1328-1339, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00640431

L. Pineda-morales, A. Costan, and G. Antoniu, Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows, 2015 IEEE International Conference on Cluster Computing, pp.294-303, 2015.
DOI : 10.1109/CLUSTER.2015.49

URL : https://hal.archives-ouvertes.fr/hal-01239150

L. Pineda-morales, J. Liu, A. Costan, E. Pacitti, G. Antoniu et al., Managing hot metadata for scientific workflows on multisite clouds, 2016 IEEE International Conference on Big Data (Big Data), pp.390-397, 2016.
DOI : 10.1109/BigData.2016.7840628

URL : https://hal.archives-ouvertes.fr/hal-01395715

F. Schmuck and R. Haskin, GPFS: A shared-disk file system for large computing clusters, Proc. of the 1st USENIX Conference on File and Storage Technologies, FAST '02, 2002.

R. Souza, V. Silva, D. Oliveira, P. Valduriez, A. A. Lima et al., Parallel execution of workflows driven by a distributed database management system, ACM/IEEE Conference on Supercomputing, Poster, 2015.

M. Stonebraker and U. Cetintemel, "One Size Fits All": An Idea Whose Time Has Come and Gone, 21st International Conference on Data Engineering (ICDE'05), pp.2-11, 2005.
DOI : 10.1109/ICDE.2005.1

URL : http://www.cs.brown.edu/~ugur/fits_all.pdf

M. Stonebraker, S. Madden, D. J. Abadi, and N. Hachem, The end of an architectural era: Time for a complete rewrite, Proc. of the 33rd Intl. Conf. on Very Large Data Bases, VLDB '07, pp.1150-1160

A. Thomson, J. Daniel, and . Abadi, CalvinFS: consistent wan replication and scalable metadata management for distributed file systems, Proc. of the 13th USENIX Conf. on File and Storage Technologies, 2015.

J. Wang, S. Wu, H. Gao, J. Li, and B. C. Ooi, Indexing multi-dimensional data in a cloud system, Proceedings of the 2010 international conference on Management of data, SIGMOD '10, pp.591-602, 2010.
DOI : 10.1145/1807167.1807232

URL : http://db.cs.hit.edu.cn/p/jinbaowang/papers/sigmod376-wangPS.pdf

J. M. Wozniak, T. G. Armstrong, M. Wilde, D. S. Katz, E. L. Lusk et al., Swift/t: scalable data flow programming for many-task applications, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.309-310, 2013.

S. Wu, D. Jiang, B. C. Ooi, and K. Wu, Efficient B-tree based indexing for cloud data processing, Proc. VLDB Endow, pp.1207-1218, 2010.
DOI : 10.14778/1920841.1920991

URL : http://www.comp.nus.edu.sg/%7Evldb2010/proceedings/files/papers/R107.pdf

J. Mohammed and . Zaki, Spade: An efficient algorithm for mining frequent sequences, Machine Learning, pp.31-60

D. Zhao, C. Shou, T. Maliky, and I. Raicu, Distributed data provenance for largescale data-intensive computing, CLUSTER, pp.1-8, 2013.
DOI : 10.1109/cluster.2013.6702685

URL : http://datasys.cs.iit.edu/publications/2013_Cluster13_Provenance.pdf

D. Zhao, Z. Zhang, X. Zhou, T. Li, K. Wang et al., FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems, 2014 IEEE International Conference on Big Data (Big Data), 2014.
DOI : 10.1109/BigData.2014.7004214

URL : http://datasys.cs.iit.edu/publications/2014_BigData14_FusionFS.pdf