R. Kalakota and M. Robinson, e-Business 2.0: Roadmap for Success, 2000.

C. K. Lam and B. C. Tan, The Internet is changing the music industry, Communications of the ACM, vol.44, issue.8, pp.62-68, 2001.
DOI : 10.1145/381641.381658

P. Common and C. Jutten, Handbook of Blind Source Separation, 2010.

G. R. Naik and W. Wang, Blind Source Separation, 2014.
DOI : 10.1007/978-3-642-55016-4

A. Hyvärinen, Fast and robust fixed-point algorithms for independent component analysis, IEEE Transactions on Neural Networks, vol.10, issue.3, pp.626-634, 1999.
DOI : 10.1109/72.761722

A. Hyvärinen and E. Oja, Independent component analysis: algorithms and applications, Neural Networks, vol.13, issue.4-5, pp.411-430, 2000.
DOI : 10.1016/S0893-6080(00)00026-5

S. Makino, T. Lee, and H. Sawada, Blind Speech Separation, 2007.
DOI : 10.1007/978-1-4020-6479-1

E. Vincent, T. Virtanen, and S. Gannot, Audio Source Separation and Speech Enhancement, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01120685

P. C. Loizou, Speech enhancement: theory and practice, 1990.

A. Liutkus, J. Durrieu, L. Daudet, and G. Richard, An overview of informed audio source separation, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), 2013.
DOI : 10.1109/WIAMIS.2013.6616139
URL : https://hal.archives-ouvertes.fr/hal-00958661

E. Vincent, N. Bertin, R. Gribonval, and F. Bimbot, From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound, IEEE Signal Processing Magazine, vol.31, issue.3, pp.107-115, 2014.
DOI : 10.1109/MSP.2013.2297440
URL : https://hal.archives-ouvertes.fr/hal-00922378

U. Zölzer, DAFX -Digital Audio Effects, 2011.

M. Müller, Fundamentals of Music Processing: Audio, Analysis, Algorithms , Applications, 2015.
DOI : 10.1007/978-3-319-21945-5

E. T. Jaynes, Probability theory: The logic of science, 2003.
DOI : 10.1017/CBO9780511790423

O. Cappé, E. Moulines, and T. Ryden, Inference in Hidden Markov Models, 2005.

R. J. Mcaulay and T. F. Quatieri, Speech analysis/Synthesis based on a sinusoidal representation, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.34, issue.4, pp.744-754, 1986.
DOI : 10.1109/TASSP.1986.1164910

S. Rickard and O. Yilmaz, On the approximate w-disjoint orthogonality of speech, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002.

S. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.27, issue.2, pp.113-120, 1979.
DOI : 10.1109/TASSP.1979.1163209

N. Wiener, Extrapolation, interpolation, and smoothing of stationary time series, 1975.

A. Liutkus and R. Badeau, Generalized Wiener filtering with fractional power spectrograms, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7177973
URL : https://hal.archives-ouvertes.fr/hal-01110028

G. Fant, Acoustic Theory of Speech Production, 1970.
DOI : 10.1515/9783110873429

B. P. Bogert, M. J. Healy, and J. W. Tukey, The quefrency alanysis of time series for echoes: Cepstrum pseudo-autocovariance, crosscepstrum , and saphe cracking Proceedings of a symposium on time series analysis, pp.209-243, 1963.

A. M. Noll, Short???Time Spectrum and ???Cepstrum??? Techniques for Vocal???Pitch Detection, The Journal of the Acoustical Society of America, vol.36, issue.2, pp.296-302, 1964.
DOI : 10.1121/1.1918949

S. B. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Audio, Speech, and Language Processing, vol.28, issue.4, pp.357-366, 1980.

A. V. Oppenheim, Speech Analysis???Synthesis System Based on Homomorphic Filtering, The Journal of the Acoustical Society of America, vol.45, issue.2, pp.458-465, 1969.
DOI : 10.1121/1.1911395

R. Durrett, Probability: theory and examples, 2010.
DOI : 10.1017/CBO9780511779398

G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.
DOI : 10.1214/aos/1176344136

L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, pp.257-286, 1989.

A. J. Viterbi, A personal history of the Viterbi algorithm, IEEE Signal Processing Magazine, vol.23, issue.4, pp.120-142, 2006.
DOI : 10.1109/MSP.2006.1657823

C. Bishop, Neural networks for pattern recognition, 1996.

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, vol.39, issue.1, pp.1-38, 1977.

J. Salamon, E. Gómez, D. Ellis, and G. Richard, Melody Extraction from Polyphonic Music Signals: Approaches, applications, and challenges, IEEE Signal Processing Magazine, vol.31, issue.2, 2014.
DOI : 10.1109/MSP.2013.2271648

N. J. Miller, Removal of noise from a voice signal by synthesis, 1973.
DOI : 10.21236/ADA008785

A. V. Oppenheim and R. W. Schafer, Homomorphic analysis of speech, IEEE Transactions on Audio and Electroacoustics, vol.16, issue.2, pp.221-226, 1968.
DOI : 10.1109/TAU.1968.1161965

R. C. Maher, An approach for the separation of voices in composite musical signals, 1989.

A. L. Wang, Instantaneous and frequency-warped techniques for auditory source separation, 1994.

Y. Meron and K. Hirose, Separation of singing and piano sounds, 5th International Conference on Spoken Language Processing, 1998.

T. F. Quatieri, Shape invariant time-scale and pitch modification of speech, IEEE Transactions on Signal Processing, vol.40, issue.3, pp.497-510, 1992.
DOI : 10.1109/78.120793

A. Ben-shalom and S. Dubnov, Optimal filtering of an instrument sound in a mixed recording given approximate pitch prior, International Computer Music Conference, 2004.

S. Shalev-shwartz, S. Dubnov, N. Friedman, and Y. Singer, Robust temporal and spectral modeling for query By melody, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '02, 2002.
DOI : 10.1145/564376.564435

X. Serra, Musical sound modeling with sinusoids plus noise, Musical Signal Processing. Swets & Zeitlinger, pp.91-122, 1997.

B. V. Veen and K. M. Buckley, Beamforming Techniques for Spatial Filtering, The Digital Signal Processing Handbook, pp.1-22, 1997.
DOI : 10.1201/9781420046052-c2

Y. Zhang and C. Zhang, Separation of voice and music by harmonic structure stability analysis, IEEE International Conference on Multimedia and Expo, 2005.

E. Terhardt, Calculating virtual pitch, Hearing Research, vol.1, issue.2, pp.155-182, 1979.
DOI : 10.1016/0378-5955(79)90025-X

Y. Zhang, C. Zhang, and S. Wang, Clustering in Knowledge Embedded Space, Machine Learning: ECML 2003, pp.480-491, 2003.
DOI : 10.1007/978-3-540-39857-8_43

H. Fujihara, T. Kitahara, M. Goto, K. Komatani, T. Ogata et al., Singer identification based on accompaniment sound reduction and reliable frame selection, 6th International Conference on Music Information Retrieval, 2005.

H. Fujihara, M. Goto, T. Kitahara, and H. G. Okuno, A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.3, pp.638-648, 2010.
DOI : 10.1109/TASL.2010.2041386

M. Goto, A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals, Speech Communication, vol.43, issue.4, pp.311-329, 2004.
DOI : 10.1016/j.specom.2004.07.001

J. A. Moorer, Signal processing aspects of computer music: A survey, Proceedings of the IEEE, vol.65, issue.8, pp.1108-1137, 2005.
DOI : 10.1109/PROC.1977.10660

A. Mesaros, T. Virtanen, and A. Klapuri, Singer identification in polyphonic music using vocal separation and pattern recognition methods, 7th International Conference on Music Information Retrieval, 2007.

M. Ryynänen and A. Klapuri, Transcription of the singing melody in polyphonic music, 7th International Conference on Music Information Retrieval, 2006.

Z. Duan, Y. Zhang, C. Zhang, and Z. Shi, Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.4, pp.766-778, 2008.
DOI : 10.1109/TASL.2008.919073

X. Rodet, Musical sound signal analysis/synthesis: Sinusoidal+residual and elementary waveform models, IEEE Time-Frequency and Time-Scale Workshop, 1997.
URL : https://hal.archives-ouvertes.fr/hal-01160995

J. O. Smith and X. Serra, PARSHL: An analysis/synthesis program for non-harmonic sounds based on a sinusoidal representation, International Computer Music Conference, 1987.

M. Slaney, D. Naar, and R. F. Lyon, Auditory model inversion for sound separation, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing, 1994.
DOI : 10.1109/ICASSP.1994.389714

M. Lagrange and G. Tzanetakis, Sound Source Tracking and Formation using Normalized Cuts, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, 2007.
DOI : 10.1109/ICASSP.2007.366616

M. Lagrange, L. G. Martins, J. Murdoch, and G. Tzanetakis, Normalized Cuts for Predominant Melodic Source Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.2, pp.278-290, 2008.
DOI : 10.1109/TASL.2007.909260
URL : https://hal.archives-ouvertes.fr/hal-01697331

J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, issue.8, pp.888-905, 2000.

M. Ryynänen, T. Virtanen, J. Paulus, and A. Klapuri, Accompaniment separation and karaoke application based on automatic melody transcription, 2008 IEEE International Conference on Multimedia and Expo, 2008.
DOI : 10.1109/ICME.2008.4607710

M. Ryynänen and A. Klapuri, Automatic Transcription of Melody, Bass Line, and Chords in Polyphonic Music, Computer Music Journal, vol.1, issue.4, pp.72-86, 2008.
DOI : 10.1109/18.87000

Y. Ding and X. Qian, Processing of musical tones using a combined quadratic polynomial-phase sinusoid and residual (QUASAR) signal model, Journal of the Audio Engineering Society, vol.45, issue.78, pp.571-584, 1997.

Y. Li and D. Wang, Singing voice separation from monaural recordings, 7th International Conference on Music Information Retrieval, 2006.

C. Duxbury, J. P. Bello, M. Davies, and M. Sandler, Complex domain onset detection for musical signals, 6th International Conference on Digital Audio Effects, 2003.

Y. Li, D. Wang, M. Wu, D. Wang, and G. J. Brown, Detecting pitch of singing voice in polyphonic audio A multipitch tracking algorithm for noisy speech, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.229-241, 2003.

G. Hu and D. Wang, Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation, IEEE Transactions on Neural Networks, vol.15, issue.5, pp.1135-1150, 2002.
DOI : 10.1109/TNN.2004.832812

Y. Han and C. Raphael, Desoloing monaural audio using mixture models, 7th International Conference on Music Information Retrieval, 2007.

S. T. Roweis, One microphone source separation, Advances in Neural Information Processing Systems 13, pp.793-799, 2001.

C. Hsu, J. R. Jang, and T. Tsai, Separation of singing voice from music accompaniment with unvoiced sounds reconstruction for monaural recordings, AES 125th Convention, 2008.

C. Hsu and J. R. Jang, On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.2, pp.310-319, 2010.

K. Dressler, Sinusoidal extraction using an efficient implementation of a multi-resolution FFT, 9th International Conference on Digital Audio Effects, 2006.

P. Scalart and J. V. Filho, Speech enhancement based on a priori signal to noise estimation, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, 1996.
DOI : 10.1109/ICASSP.1996.543199

C. Raphael and Y. Han, A Classifier-Based Approach to Score-Guided Source Separation of Musical Audio, Computer Music Journal, vol.17, issue.1, pp.51-59, 2008.
DOI : 10.1109/TSA.2005.860342

L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and Regression Trees, 1984.

E. Cano and C. Cheng, Melody line detection and source separation in classical saxophone recordings, 12th International Conference on Digital Audio Effects, 2009.

S. Grollmisch, E. Cano, and C. Dittmar, Songs2See: Learn to play by playing, AES 41st Conference: Audio for Games, pp.2-3, 2011.

C. Dittmar, E. Cano, J. Abeßer, and S. Grollmisch, Music information retrieval meets music education, Multimodal Music Processing, pp.95-120, 2012.

E. Cano, C. Dittmar, and G. Schuller, Efficient implementation of a system for solo and accompaniment separation in polyphonic music, 20th European Signal Processing Conference, 2012.

K. Dressler, Pitch estimation by the pair-wise evaluation of spectral peaks, 42nd AES Conference on Semantic Audio, 2011.

E. Cano, C. Dittmar, and G. Schuller, Re-thinking sound separation: Prior information and additivity constraints in separation algorithms, 16th International Conference on Digital Audio Effects, 2013.

E. Cano, G. Schuller, and C. Dittmar, Pitch-informed solo and accompaniment separation towards its use in music education applications, EURASIP Journal on Advances in Signal Processing, vol.19, issue.7, 2014.
DOI : 10.1109/TASL.2011.2109381

J. J. Bosch, K. Kondo, R. Marxer, and J. Janer, Score-informed and timbre independent lead instrument separation in real-world scenarios, 20th European Signal Processing Conference, 2012.

R. Marxer, J. Janer, and J. Bonada, Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models, 10th International Conference on Latent Variable Analysis and Signal Separation, 2012.
DOI : 10.1109/TSP.2004.828896

A. Vaneph, E. Mcneil, and F. Rigaud, An automated source separation technology and its practical applications, 140th AES Convention, 2016.

S. Leglaive, R. Hennequin, and R. Badeau, Singing voice detection with deep recurrent neural networks, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7177944
URL : https://hal.archives-ouvertes.fr/hal-01110035

D. D. Lee and H. S. Seung, Learning the parts of objects by nonnegative matrix factorization, Nature, vol.401, pp.788-791, 1999.

P. Smaragdis and J. C. Brown, Non-negative matrix factorization for polyphonic music transcription, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684), 2003.
DOI : 10.1109/ASPAA.2003.1285860

T. Virtanen, Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.3, pp.1066-1074, 2007.
DOI : 10.1109/TASL.2006.885253

C. Févotte, Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis, Neural Computation, vol.14, issue.3, pp.793-830, 2009.
DOI : 10.1016/j.sigpro.2007.01.024

P. Common, Independent component analysis, A new concept?, Signal Processing, vol.36, issue.3, pp.287-314, 1994.
DOI : 10.1016/0165-1684(94)90029-9

S. Vembu and S. Baumann, Separation of vocals from polyphonic audio recordings, 6th International Conference on Music Information Retrieval, 2005.

H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, The Journal of the Acoustical Society of America, vol.87, issue.4, pp.1738-1752, 1990.
DOI : 10.1121/1.399423

T. L. Nwe and Y. Wang, Automatic detection of vocal segments in popular songs, 5th International Conference for Music Information Retrieval, 2004.

M. A. Casey and A. Westner, Separation of mixed audio sources by independent subspace analysis, International Computer Music Conference, 2000.

A. Chanrungutai and C. A. Ratanamahatana, Singing voice separation for mono-channel music using Non-negative Matrix Factorization, 2008 International Conference on Advanced Technologies for Communications, 2008.
DOI : 10.1109/ATC.2008.4760565

A. N. Tikhonov, Solution of incorrectly formulated problems and the regularization method, Soviet Mathematics, vol.4, pp.1035-1038, 1963.

R. Marxer and J. Janer, A Tikhonov regularization method for spectrum decomposition in low latency audio source separation, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.
DOI : 10.1109/ICASSP.2012.6287871

P. Yang, C. Hsu, and J. Chien, Bayesian singing-voice separation, 15th International Society for Music Information Retrieval Conference, 2014.

J. Chien and P. Yang, Bayesian Factorization and Learning for Monaural Source Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.1, pp.185-195, 2015.
DOI : 10.1109/TASLP.2015.2502141

M. N. Schmidt, O. Winther, and L. K. Hansen, Bayesian Non-negative Matrix Factorization, 8th International Conference on Independent Component Analysis and Signal Separation, 2009.
DOI : 10.1137/1.9781611972771.31
URL : http://www.ime.usp.br/%7Ejstern/miscellanea/seminario/Schmidt09.pdf

M. Spiertz and V. Gnann, Source-filter based clustering for monaural blind source separation, 12th International Conference on Digital Audio Effects, 2009.

P. Smaragdis and G. J. Mysore, Separation by " humming " : Userguided sound extraction from monophonic mixtures, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009.
DOI : 10.1109/aspaa.2009.5346542

P. Smaragdis, B. Raj, and M. Shashanka, Supervised and semisupervised separation of sounds from single-channel mixtures, 7th International Conference on Independent Component Analysis and Signal Separation, 2007.

T. Nakamuray and H. Kameoka, Lp-norm non-negative matrix factorization and its application to singing voice enhancement, IEEE International Conference on Acoustics, Speech and Signal Processing, 2015.

J. M. Ortega and W. C. Rheinboldt, Iterative solution of nonlinear equations in several variables, 1970.
DOI : 10.1137/1.9780898719468

H. Kameoka, M. Goto, and S. Sagayama, Selective amplifier of periodic and non-periodic components in concurrent audio signals with spectral control envelopes, 2006.

E. J. Candès, X. Li, Y. Ma, and J. Wright, Robust principal component analysis?, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1-37, 2011.
DOI : 10.1145/1970392.1970395

P. Sprechmann, A. Bronstein, and G. Sapiro, Real-time online singing voice separation from monaural recordings using robust low-rank modeling, 13th International Society for Music Information Retrieval Conference, 2012.

B. Recht, M. Fazel, and P. A. Parrilo, Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization, SIAM Review, vol.52, issue.3, pp.471-501, 2010.
DOI : 10.1137/070697835

B. Recht and C. Ré, Parallel stochastic gradient algorithms for large-scale matrix completion, Mathematical Programming Computation, vol.8, issue.2, pp.201-226, 2013.
DOI : 10.1137/S1052623495294797

K. Gregor and Y. Lecun, Learning fast approximations of sparse coding, 27th International Conference on Machine Learning, 2010.

L. Zhang, Z. Chen, M. Zheng, and X. He, Robust non-negative matrix factorization, Frontiers of Electrical and Electronic Engineering in China, vol.23, issue.2, pp.192-200, 2011.
DOI : 10.1109/34.908974

I. Jeong and K. Lee, Vocal separation using extended robust principal component analysis with Schatten P /Lp-norm and scale compression, International Workshop on Machine Learning for Signal Processing, 2014.

F. Nie, H. Wang, and H. Huang, Joint Schatten $$p$$ p -norm and $$\ell _p$$ ??? p -norm robust matrix completion for missing value recovery, Knowledge and Information Systems, vol.64, issue.2, pp.525-544, 2015.
DOI : 10.1103/PhysRevE.64.025102

Y. Yang, Low-rank representation of both singing voice and music accompaniment via learned dictionaries, 14th International Society for Music Information Retrieval conference, 2013.

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online dictionary learning for sparse coding, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553463
URL : http://www.ima.umn.edu/preprints/apr2009/2249.pdf

T. T. Chan and Y. Yang, Complex and Quaternionic Principal Component Pursuit and Its Application to Audio Separation, IEEE Signal Processing Letters, vol.23, issue.2, pp.287-291, 2016.
DOI : 10.1109/LSP.2016.2514845

G. Peeters, Deriving Musical Structures from Signal Analysis for Music Audio Summary Generation: ???Sequence??? and ???State??? Approach, International Symposium on Computer Music Multidisciplinary Research, 2003.
DOI : 10.1007/978-3-540-39900-1_14

R. B. Dannenberg and M. Goto, Music Structure Analysis from Acoustic Signals, Handbook of Signal Processing in Acoustics, pp.305-331, 2008.
DOI : 10.1007/978-0-387-30441-0_21

J. Paulus, M. Müller, and A. Klapuri, Audio-based music structure analysis, 11th International Society for Music Information Retrieval Conference, 2010.

Z. Rafii and B. Pardo, A simple music/voice separation system based on the extraction of the repeating musical structure, IEEE International Conference on Acoustics, Speech and Signal Processing, 2011.

Z. Rafii, A. Liutkus, and B. Pardo, REPET for Background/Foreground Separation in Audio, Blind Source Separation, pp.395-411, 2014.
DOI : 10.1007/978-3-642-55016-4_14
URL : https://hal.archives-ouvertes.fr/hal-01025563

J. Foote and S. Uchihashi, The beat spectrum: a new approach to rhythm analysis, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001., 2001.
DOI : 10.1109/ICME.2001.1237863

P. Seetharaman, F. Pishdadian, and B. Pardo, Music/Voice separation using the 2D fourier transform, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017.
DOI : 10.1109/WASPAA.2017.8169990

A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, and G. Richard, Adaptive filtering for music/voice separation exploiting the repeating musical structure, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.
DOI : 10.1109/ICASSP.2012.6287815
URL : https://hal.archives-ouvertes.fr/hal-00945300

Z. Rafii and B. Pardo, Music/voice separation using the similarity matrix, 13th International Society for Music Information Retrieval Conference, 2012.

J. Foote, Visualizing music and audio using self-similarity, Proceedings of the seventh ACM international conference on Multimedia (Part 1) , MULTIMEDIA '99, 1999.
DOI : 10.1145/319463.319472
URL : http://www.cs.fiu.edu/%7Elli003/Music/mv/1.pdf

Z. Rafii and B. Pardo, Online REPET-SIM for real-time speech enhancement, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
DOI : 10.1109/ICASSP.2013.6637768

Z. Rafii, A. Liutkus, and B. Pardo, A simple user interface system for recovering patterns repeating in time and frequency in mixtures of sounds, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7177974
URL : https://hal.archives-ouvertes.fr/hal-01116689

D. Fitzgerald, Vocal Separation using Nearest Neighbours and Median Filtering, IET Irish Signals and Systems Conference (ISSC 2012), 2012.
DOI : 10.1049/ic.2012.0225

A. Liutkus, Z. Rafii, B. Pardo, D. Fitzgerald, and L. Daudet, Kernel spectrogram models for source separation, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014.
DOI : 10.1109/HSCMA.2014.6843240
URL : https://hal.archives-ouvertes.fr/hal-00959384

A. Liutkus, D. Fitzgerald, Z. Rafii, B. Pardo, and L. Daudet, Kernel Additive Models for Source Separation, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4298-4310, 2014.
DOI : 10.1109/TSP.2014.2332434
URL : https://hal.archives-ouvertes.fr/hal-01011044

A. Liutkus, D. Fitzgerald, and Z. Rafii, Scalable audio separation with light Kernel Additive Modelling, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7177935
URL : https://hal.archives-ouvertes.fr/hal-01114890

T. Prätzlich, R. Bittner, A. Liutkus, and M. Müller, Kernel Additive Modeling for interference reduction in multi-channel music recordings, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7178036

D. F. Yela, S. Ewert, D. Fitzgerald, and M. Sandler, Interference reduction in music recordings combining kernel additive modelling and non-negative matrix factorization, IEEE International Conference on Acoustics, Speech and Signal Processing, 2017.

M. Moussallam, G. Richard, and L. Daudet, Audio source separation informed by redundancy with greedy multiscale decompositions, 20th European Signal Processing Conference, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00735234

S. G. Mallat and Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, vol.41, issue.12, pp.3397-3415, 1993.
DOI : 10.1109/78.258082
URL : http://home.ustc.edu.cn/~zhanghan/cs/Mallat_Zhang93.pdf

H. Deif, D. Fitzgerald, W. Wang, and L. Gan, Separation of vocals from monaural music recordings using diagonal median filters and practical time-frequency parameters, 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2015.
DOI : 10.1109/ISSPIT.2015.7394320

D. Fitzgerald and M. Gainza, Single channel vocal separation using median filtering and factorisation techniques, ISAST Transactions on Electronic and Signal Processing, vol.4, issue.1, pp.62-73, 2010.

J. Lee and H. Kim, Music and Voice Separation Using Log-Spectral Amplitude Estimator Based on Kernel Spectrogram Models Backfitting, The Journal of the Acoustical Society of Korea, vol.34, issue.3, pp.227-233, 2015.
DOI : 10.7776/ASK.2015.34.3.227

J. Lee, H. Cho, and H. Kim, Vocal separation from monaural music using adaptive auditory filtering based on kernel back-fitting, Interspeech, 2015.

H. Cho, J. Lee, and H. Kim, Singing voice separation from monaural music based on kernel back-fitting using beta-order spectral amplitude estimation, 16th International Society for Music Information Retrieval Conference, 2015.

H. Kim and J. Y. Kim, Music/Voice Separation Based on Kernel Back-fitting Using Weighted ?-order MMSE Estimation, ETRI Journal, vol.38, issue.3, pp.510-517, 2016.
DOI : 10.4218/etrij.16.0115.0256

E. Plourde and B. Champagne, Auditory-Based Spectral Amplitude Estimators for Speech Enhancement, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.8, pp.1614-1623, 2008.
DOI : 10.1109/TASL.2008.2004304

B. Raj, P. Smaragdis, M. Shashanka, and R. Singh, Separating a foreground singer from background music, International Symposium on Frontiers of Research on Speech and Music, 2007.

P. Smaragdis and B. Raj, Shift-invariant probabilistic latent component analysis, MERL, Tech. Rep, 2006.

B. Raj and P. Smaragdis, Latent variable decomposition of spectrograms for single channel speaker separation, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., 2005.
DOI : 10.1109/ASPAA.2005.1540157

J. Han and C. Chen, Improving melody extraction using Probabilistic Latent Component Analysis, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011.
DOI : 10.1109/ICASSP.2011.5946321

P. Boersma159-]-e, F. J. Gómez, J. Nadas-quesada, J. Salamon, P. V. Bonada et al., Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound Predominant fundamental frequency estimation vs singing voice separation for the automatic transcription of accompanied flamenco singing, IFA Proceedings 17 13th International Society for Music Information Retrieval Conference, 1993.

N. Ono, K. Miyamoto, J. L. Roux, H. Kameoka, and S. Sagayama, Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram, 16th European Signal Processing Conference, 2008.

H. Papadopoulos and D. P. Ellis, Music-content-adaptive robust principal component analysis for a semantically consistent separation of foreground and background in music audio signals, 17th International Conference on Digital Audio Effects, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01104896

T. Chan, T. Yeh, Z. Fan, H. Chen, L. Su et al., Vocal activity informed singing voice separation with the iKala dataset, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7178063

I. Jeong and K. Lee, Singing Voice Separation Using RPCA with Weighted $$l_{1}$$ -norm, 13th International Conference on Latent Variable Analysis and Signal Separation, 2017.
DOI : 10.1016/j.sigpro.2011.10.007

T. Virtanen, A. Mesaros, and M. Ryynänen, Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music, ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition, 2008.
DOI : 10.1121/1.2942661

Y. Wang and Z. Ou, Combining HMM-based melody extraction and NMF-based soft masking for separating voice and accompaniment from monaural audio, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011.
DOI : 10.1109/ICASSP.2011.5946313

A. Klapuri, Multiple fundamental frequency estimation by summing harmonic amplitudes, 7th International Conference on Music Information Retrieval, 2006.

C. Hsu, L. Chen, J. R. Jang, and H. Li, Singing pitch extraction from monaural polyphonic songs by contextual audio modeling and singing harmonic enhancement, 10th International Society for Music Information Retrieval Conference, 2009.

Z. Rafii, Z. Duan, and B. Pardo, Combining Rhythm-Based and Pitch-Based Methods for Background and Melody Separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.12, pp.1884-1893, 2014.
DOI : 10.1109/TASLP.2014.2354242

Z. Duan and B. Pardo, Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.8, pp.2121-2133, 2010.
DOI : 10.1109/TASL.2010.2042119

S. Venkataramani, N. Nayak, P. Rao, and R. Velmurugan, Vocal separation using singer-vowel priors obtained from polyphonic audio, 15th International Society for Music Information Retrieval Conference, 2014.

V. Rao and P. Rao, Vocal Melody Extraction in the Presence of Pitched Accompaniment in Polyphonic Music, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.8, pp.2145-2154, 2010.
DOI : 10.1109/TASL.2010.2042124

V. Rao, C. Gupta, and P. Rao, Context-Aware Features for Singing Voice Detection in Polyphonic Music, International Workshop on Adaptive Multimedia Retrieval, 2011.
DOI : 10.1007/978-3-642-37425-8_4

M. Kim, J. Yoo, K. Kang, and S. Choi, Nonnegative Matrix Partial Co-Factorization for Spectral and Temporal Drum Source Separation, IEEE Journal of Selected Topics in Signal Processing, vol.5, issue.6, pp.1192-1204, 2011.
DOI : 10.1109/JSTSP.2011.2158803

L. Zhang, Z. Chen, M. Zheng, and X. He, Nonnegative matrix and tensor factorizations: An algorithmic perspective, IEEE Signal Processing Magazine, vol.31, issue.3, pp.54-65, 2014.

Y. Ikemiya, K. Yoshii, and K. Itoyama, Singing voice analysis and editing based on mutually dependent F0 estimation and source separation, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7178034

Y. Ikemiya, K. Itoyama, and K. Yoshii, Singing Voice Separation and Vocal F0 Estimation Based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.11, pp.2084-2095, 2016.
DOI : 10.1109/TASLP.2016.2577879

D. J. Hermes, Measurement of pitch by subharmonic summation, The Journal of the Acoustical Society of America, vol.83, issue.1, pp.257-264, 1988.
DOI : 10.1121/1.396427

A. Dobashi, Y. Ikemiya, K. Itoyama, and K. Yoshii, A music performance assistance system based on vocal, harmonic, and percussive source separation and content visualization for music audio signals, 12th Sound and Music Computing Conference, 2015.

Y. Hu and G. Liu, Separation of Singing Voice Using Nonnegative Matrix Partial Co-Factorization for Singer Identification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, issue.4, pp.643-653, 2015.
DOI : 10.1109/TASLP.2015.2396681

J. Yoo, M. Kim, K. Kang, and S. Choi, Nonnegative matrix partial co-factorization for drum source separation, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010.
DOI : 10.1109/ICASSP.2010.5495305

P. Boersma, PRAAT, a system for doing phonetics by computer, Glot International, vol.510, issue.9, pp.341-347, 2001.

Y. Li, J. Woodruff, and D. Wang, Monaural musical sound separation based on pitch and common amplitude modulation, IEEE Transactions on Audio, Speech, and Language Processing, vol.17, issue.7, pp.1361-1371, 2009.

B. Raj, M. L. Seltzer, and R. M. Stern, Reconstruction of missing features for robust speech recognition, Speech Communication, vol.43, issue.4, pp.275-296, 2004.
DOI : 10.1016/j.specom.2004.03.007

Y. Hu and G. Liu, Monaural Singing Voice Separation by Non-negative Matrix Partial Co-Factorization with Temporal Continuity and Sparsity Criteria, 12th International Conference on Intelligent Computing, 2016.
DOI : 10.1109/TASL.2009.2034186

X. Zhang, W. Li, and B. Zhu, Latent time-frequency component analysis: A novel pitch-based approach for singing voice separation, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7177946

A. De-cheveigné and H. Kawahara, YIN, a fundamental frequency estimator for speech and music, The Journal of the Acoustical Society of America, vol.111, issue.4, pp.1917-1930, 2002.
DOI : 10.1121/1.1458024

B. Zhu, W. Li, and L. Li, Towards solving the bottleneck of pitchbased singing voice separation, 23rd ACM International Conference on Multimedia, 2015.

J. Durrieu, G. Richard, and B. David, Singer melody extraction in polyphonic signals using source separation methods, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008.
DOI : 10.1109/ICASSP.2008.4517573

J. Durrieu, G. Richard, B. David, and C. Févotte, Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.3, pp.564-575, 2010.
DOI : 10.1109/TASL.2010.2041114

A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.5, pp.1564-1578, 2007.
DOI : 10.1109/TASL.2007.899291
URL : https://hal.archives-ouvertes.fr/inria-00544774

D. H. Klatt and L. C. Klatt, Analysis, synthesis, and perception of voice quality variations among female and male talkers, The Journal of the Acoustical Society of America, vol.87, issue.2, pp.820-857, 1990.
DOI : 10.1121/1.398894

L. Benaroya, L. Mcdonagh, F. Bimbot, and R. Gribonval, Non negative sparse representation for Wiener based source separation with a single sensor, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 2003.
DOI : 10.1109/ICASSP.2003.1201756
URL : https://hal.archives-ouvertes.fr/inria-00574784

I. S. Dhillon and S. Sra, Generalized nonnegative matrix approximations with Bregman divergences, Advances in Neural Information Processing Systems 18, pp.283-290, 2005.

L. Benaroya, F. Bimbot, and R. Gribonval, Audio source separation with a single sensor, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.1, pp.191-199, 2006.
DOI : 10.1109/TSA.2005.854110
URL : https://hal.archives-ouvertes.fr/inria-00544949

J. Durrieu and J. Thiran, Musical Audio Source Separation Based on User-Selected F0 Track, 10th International Conference on Latent Variable Analysis and Signal Separation, 2012.
DOI : 10.1109/TSA.2005.860342
URL : https://infoscience.epfl.ch/record/174056/files/README.pdf

B. Fuentes, R. Badeau, and G. Richard, Blind harmonic adaptive decomposition applied to supervised source separation, Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pp.2654-2658, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00945288

J. C. Brown, spectral transform, The Journal of the Acoustical Society of America, vol.89, issue.1, pp.425-434, 1991.
DOI : 10.1121/1.400476

J. C. Brown and M. S. Puckette, transform, The Journal of the Acoustical Society of America, vol.92, issue.5, pp.2698-2701, 1992.
DOI : 10.1121/1.404385

]. C. Schörkhuber and A. Klapuri, Constant-Q transform toolbox, 7th Sound and Music Computing Conference, 2010.

J. Durrieu, B. David, and G. Richard, A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation, IEEE Journal of Selected Topics in Signal Processing, vol.5, issue.6, pp.1180-1191, 2011.
DOI : 10.1109/JSTSP.2011.2158801

C. Joder and B. Schuller, Score-informed leading voice separation from monaural audio, 13th International Society for Music Information Retrieval Conference, 2012.

C. Joder, S. Essid, and G. Richard, A Conditional Random Field Framework for Robust and Scalable Audio-to-Score Matching, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.8, pp.2385-2397, 2011.
DOI : 10.1109/TASL.2011.2134092

R. Zhao, S. Lee, D. Huang, and M. Dong, Soft constrained leading voice separation with music score guidance, The 9th International Symposium on Chinese Spoken Language Processing, 2014.
DOI : 10.1109/ISCSLP.2014.6936723

J. Durrieu, A. Ozerov, C. Févotte, G. Richard, and B. David, Main instrument separation from stereophonic audio signals using a source/filter model, 17th European Signal Processing Conference, 2009.

J. Janer and R. Marxer, Separation of unvoiced fricatives in singing voice mixtures with semi-supervised NMF, 16th International Conference on Digital Audio Effects, 2013.

J. Janer, R. Marxer, and K. Arimoto, Combining a harmonic-based NMF decomposition with transient analysis for instantaneous percussion separation, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.
DOI : 10.1109/ICASSP.2012.6287872

R. Marxer and J. Janer, Modelling and separation of singing voice breathiness in polyphonic mixtures, 16th International Conference on Digital Audio Effects, 2013.

G. Degottex, A. Roebel, and X. Rodet, Pitch transposition and breathiness modification using a glottal source model and its adapted vocaltract filter, IEEE International Conference on Acoustics, Speech and Signal Processing, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01106641

A. Ozerov, E. Vincent, and F. Bimbot, A General Modular Framework for Audio Source Separation, 9th International Conference on Latent Variable Analysis and Signal Separation, 2010.
DOI : 10.1007/978-3-642-15995-4_5
URL : https://hal.archives-ouvertes.fr/inria-00553504

Y. Salaün, E. Vincent, N. Bertin, N. Souvirà-a-labastie, X. Jaureguiberry et al., The flexible audio source separation toolbox version 2.0, IEEE International Conference on Acoustics, Speech and Signal Processing, 2014.

R. Hennequin and F. Rigaud, Long-term reverberation modeling for under-determined audio source separation with application to vocal melody extraction, 17th International Society for Music Information Retrieval Conference, 2016.

R. Singh, B. Raj, and P. Smaragdis, Latent-variable decomposition based dereverberation of monaural and multi-channel signals, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010.
DOI : 10.1109/ICASSP.2010.5495326

N. Ono, K. Miyamoto, H. Kameoka, and S. Sagayama, A real-time equalizer of harmonic and percussive components in music signals, 9th International Conference on Music Information Retrieval, 2008.

D. Fitzgerald, Harmonic/percussive separation using median filtering, 13th International Conference on Digital Audio Effects, 2010.

Y. Yang, On sparse and low-rank matrix decomposition for singing voice separation, Proceedings of the 20th ACM international conference on Multimedia, MM '12, 2012.
DOI : 10.1145/2393347.2396305

I. Jeong and K. Lee, Vocal Separation from Monaural Music Using Temporal/Spectral Continuity and Sparsity Constraints, IEEE Signal Processing Letters, vol.21, issue.10, pp.1197-1200, 2014.
DOI : 10.1109/LSP.2014.2329946

E. Ochiai, T. Fujisawa, and M. Ikehara, Vocal separation by constrained non-negative matrix factorization, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015.
DOI : 10.1109/APSIPA.2015.7415317

T. Watanabe, T. Fujisawa, and M. Ikehara, Vocal separation using improved robust principal component analysis and post-processing, 2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS), 2016.
DOI : 10.1109/MWSCAS.2016.7870055

H. Raguet, J. Fadili, and G. Peyré, A Generalized Forward-Backward Splitting, SIAM Journal on Imaging Sciences, vol.6, issue.3, pp.1199-1226, 2013.
DOI : 10.1137/120872802
URL : https://hal.archives-ouvertes.fr/hal-00613637

A. Hayashi, H. Kameoka, T. Matsubayashi, and H. Sawada, Nonnegative periodic component analysis for music source separation, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016.

D. Fitzgerald, M. Cranitch, and E. Coyle, Using tensor factorisation models to separate drums from polyphonic music, 12th International Conference on Digital Audio Effects, 2009.

H. Tachibana, N. Ono, and S. Sagayama, Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, issue.1, pp.228-237, 2014.
DOI : 10.1109/TASLP.2013.2287052

H. Tachibana, T. Ono, N. Ono, and S. Sagayama, Melody line estimation in homophonic music audio signals based on temporalvariability of melodic source, IEEE International Conference on Acoustics, Speech and Signal Processing, 2010.

H. Tachibana, N. Ono, and S. Sagayama, A Real-time Audio-to-audio Karaoke Generation System for Monaural Recordings Based on Singing Voice Suppression and Key Conversion Techniques, Journal of Information Processing, vol.24, issue.3, pp.470-482, 2016.
DOI : 10.2197/ipsjjip.24.470

N. Ono, K. Miyamoto, H. Kameoka, J. L. Roux, Y. Uchiyama et al., Harmonic and Percussive Sound Separation and Its Application to MIR-Related Tasks, Advances in Music Information Retrieval, pp.213-236, 2010.
DOI : 10.1007/978-3-642-11674-2_10

H. Tachibana, H. Kameoka, N. Ono, and S. Sagayama, Comparative evaluations of multiple harmonic/percussive sound separation techniques based on anisotropic smoothness of spectrogram, IEEE International Conference on Acoustics, Speech and Signal Processing, 2012.

H. Deif, W. Wang, L. Gan, and S. Alhashmi, A local discontinuity based approach for monaural singing voice separation from accompanying music with multi-stage non-negative matrix factorization, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2015.
DOI : 10.1109/GlobalSIP.2015.7418163

B. Zhu, W. Li, R. Li, and X. Xue, Multi-Stage Non-Negative Matrix Factorization for Monaural Singing Voice Separation, IEEE Transactions on Audio, Speech, and Language Processing, vol.21, issue.10, pp.2096-2107, 2013.
DOI : 10.1109/TASL.2013.2266773

J. Driedger and M. Müller, Extracting singing voice from music recordings by cascading audio decomposition techniques, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7177945

J. Driedger, M. Müller, and S. Disch, Extending harmonic-percussive separation of audio signals, 15th International Society for Music Information Retrieval Conference, 2014.

R. Talmon, I. Cohen, and S. Gannot, Transient Noise Reduction Using Nonlocal Diffusion Filters, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.6, pp.1584-1599, 2011.
DOI : 10.1109/TASL.2010.2093651

C. Hsu, D. Wang, J. R. Jang, and K. Hu, A Tandem Algorithm for Singing Pitch Extraction and Voice Separation From Music Accompaniment, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.5, pp.1482-1491, 2012.
DOI : 10.1109/TASL.2011.2182510

G. Hu and D. Wang, A tandem algorithm for pitch estimation and voiced speech segregation, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.8, pp.2067-2079, 2010.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation, Parallel distributed processing: explorations in the microstructure of cognition, pp.318-362, 1986.

N. J. Bryan and G. J. Mysore, Interactive user-feedback for sound source separation, International Conference on Intelligent User- Interfaces, Workshop on Interactive Machine Learning, 2013.

K. Ganchev, J. Graça, J. Gillenwater, and B. Taskar, Posterior regularization for structured latent variable models Weighted nonnegative tensor factorization: on monotonicity of multiplicative update rules and application to user-guided audio source separation, Journal of Machine Learning Research, vol.11, 2001.

X. Jaureguiberry, G. Richard, P. Leveau, R. Hennequin, and E. Vincent, Introducing a simple fusion framework for audio source separation, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2013.
DOI : 10.1109/MLSP.2013.6661930
URL : https://hal.archives-ouvertes.fr/hal-00846834

X. Jaureguiberry, E. Vincent, and G. Richard, Variational Bayesian model averaging for audio source separation, 2014 IEEE Workshop on Statistical Signal Processing (SSP), 2014.
DOI : 10.1109/SSP.2014.6884568
URL : https://hal.archives-ouvertes.fr/hal-00986909

J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky, Bayesian model averaging: A tutorial, Statistical Science, vol.14, issue.4, pp.382-417, 1999.

M. Mcvicar, R. Santos-rodriguez, and T. D. Bie, Learning to separate vocals from polyphonic mixtures via ensemble methods and structured output prediction, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
DOI : 10.1109/ICASSP.2016.7471715

A. K. Jain and F. Farrokhnia, Unsupervised texture segmentation using Gabor filters, IEEE International Conference on Systems, Man and Cybernetics, 1990.
DOI : 10.1109/icsmc.1990.142050
URL : http://www.ee.columbia.edu/~sfchang/course/dip/handout/jain-texture.pdf

P. Huang, M. Kim, M. Hasegawa-johnson, and P. Smaragdis, Singing-voice separation from monaural recordings using deep recurrent neural networks, 15th International Society for Music Information Retrieval Conference, 2014.

S. Lacoste-julien, M. Jaggi, M. Schmidt, and P. Pletscher, Blockcoordinate Frank-Wolfe optimization for structural SVMs, 30th International Conference on Machine Learning, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00720158

E. Manilow, P. Seetharaman, F. Pishdadian, and B. Pardo, Predicting algorithm efficacy for adaptive multi-cue source separation, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017.
DOI : 10.1109/WASPAA.2017.8170038

G. Wolf, S. Mallat, and S. Shamma, Audio source separation with time-frequency velocities, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2014.
DOI : 10.1109/MLSP.2014.6958893

J. Andén and S. Mallat, Deep Scattering Spectrum, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4114-4128, 2014.
DOI : 10.1109/TSP.2014.2326991

C. P. Bernard, Discrete Wavelet Analysis for Fast Optic Flow Computation, Applied and Computational Harmonic Analysis, vol.11, issue.1, pp.32-63, 2001.
DOI : 10.1006/acha.2000.0341

F. Yen, Y. Luo, and T. Chi, Singing voice separation using spectro-temporal modulation features, 15th International Society for Music Information Retrieval Conference, 2014.

F. Yen, M. Huang, and T. Chi, A two-stage singing voice separation algorithm using spectro-temporal modulation features, Interspeech, 2015.

T. Chi, P. Rub, and S. A. Shamma, Multiresolution spectrotemporal analysis of complex sounds, The Journal of the Acoustical Society of America, vol.118, issue.2, pp.887-906, 2005.
DOI : 10.1121/1.1945807

T. Chi, Y. Gao, M. C. Guyton, P. Ru, and S. Shamma, Spectro-temporal modulation transfer functions and speech intelligibility, The Journal of the Acoustical Society of America, vol.106, issue.5, pp.2719-2732, 1999.
DOI : 10.1121/1.428100

T. T. Chan and Y. Yang, Informed Group-Sparse Representation for Singing Voice Separation, IEEE Signal Processing Letters, vol.24, issue.2, pp.156-160, 2017.
DOI : 10.1109/LSP.2017.2647810

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, pp.49-67, 2006.
DOI : 10.1198/016214502753479356

S. Ma, Alternating Proximal Gradient Method for Convex Minimization, Journal of Scientific Computing, vol.15, issue.2, p.546572, 2016.
DOI : 10.1198/106186006X113430

G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu et al., Robust Recovery of Subspace Structures by Low-Rank Representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.1, pp.171-184, 2007.
DOI : 10.1109/TPAMI.2012.88

A. Varga and H. J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Communication, vol.12, issue.3, pp.247-251, 1993.
DOI : 10.1016/0167-6393(93)90095-3

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM, 1993.

N. Sturmel, A. Liutkus, J. Pinel, L. Girin, S. Marchand et al., Linear mixing models for active listening of music productions in realistic studio conditions, 132nd AES Convention, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00790783

M. Vinyes, MTG MASS database, 2008.

E. Vincent, S. Araki, and P. Bofill, The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation, 8th International Conference on Independent Component Analysis and Signal Separation, 2009.
DOI : 10.1109/ICASSP.2009.4959531
URL : https://hal.archives-ouvertes.fr/inria-00544168

S. Araki, A. Ozerov, B. V. Gowreesunker, H. Sawada, F. J. Theis et al., The 2010 Signal Separation Evaluation Campaign (SiSEC2010): Audio Source Separation, 9th International Conference on Latent Variable Analysis and Signal Separation, 2010.
DOI : 10.1007/978-3-642-15995-4_15
URL : https://hal.archives-ouvertes.fr/inria-00553385

S. Araki, F. Nesta, E. Vincent, Z. Koldovsky, G. Nolte et al., The 2011 Signal Separation Evaluation Campaign (SiSEC2011): - Audio Source Separation -, 10th International Conference on Latent Variable Analysis and Signal Separation, 2012.
DOI : 10.1016/j.sigpro.2011.09.032
URL : https://hal.archives-ouvertes.fr/hal-00655394

E. Vincent, S. Araki, F. J. Theis, G. Nolte, P. Bofill et al., The signal separation evaluation campaign (2007???2010): Achievements and remaining challenges, Signal Processing, vol.92, issue.8, pp.1928-1936, 2012.
DOI : 10.1016/j.sigpro.2011.10.007
URL : https://hal.archives-ouvertes.fr/inria-00579398

N. Ono, Z. Rafii, D. Kitamura, N. Ito, and A. Liutkus, The 2015 Signal Separation Evaluation Campaign, 12th International Conference on Latent Variable Analysis and Signal Separation, 2015.
DOI : 10.1007/978-3-319-22482-4_45
URL : https://hal.archives-ouvertes.fr/hal-01188725

A. Liutkus, F. Stöter, Z. Rafii, D. Kitamura, B. Rivet et al., The 2016 Signal Separation Evaluation Campaign, 13th International Conference on Latent Variable Analysis and Signal Separation, 2017.
DOI : 10.1109/EUSIPCO.2016.7760551
URL : https://hal.archives-ouvertes.fr/hal-01472932

A. Liutkus, R. Badeau, and G. Richard, Gaussian Processes for Underdetermined Source Separation, IEEE Transactions on Signal Processing, vol.59, issue.7, pp.3155-3167, 2011.
DOI : 10.1109/TSP.2011.2119315
URL : https://hal.archives-ouvertes.fr/hal-00643951

R. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam et al., MedleyDB: A multitrack dataset for annotation-intensive mir research, 15th International Society for Music Information Retrieval Conference, 2014.

Z. Rafii, A. Liutkus, F. Stöter, S. I. Mimilakis, and R. Bittner, Musdb18, a dataset for audio source separation Available: https, 2017.

A. Ozerov, P. Philippe, R. Gribonval, and F. Bimbot, One microphone singing voice separation using source-adapted models, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., 2005.
DOI : 10.1109/ASPAA.2005.1540176
URL : https://hal.archives-ouvertes.fr/inria-00564491

W. Tsai, D. Rogers, and H. Wang, Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics, Computer Music Journal, vol.39, issue.3, pp.68-78, 2004.
DOI : 10.1109/TSA.2002.800560

J. Gauvain and C. Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Transactions on Speech and Audio Processing, vol.2, issue.2, pp.291-298, 1994.
DOI : 10.1109/89.279278

E. Vincent, M. Jafari, S. Abdallah, M. Plumbley, and M. Davies, Probabilistic Modeling Paradigms for Audio Source Separation, Machine Audition: Principles, Algorithms and Systems. IGI Global, pp.162-185, 2010.
DOI : 10.4018/978-1-61520-919-4.ch007
URL : https://hal.archives-ouvertes.fr/inria-00544016

Z. Rafii, D. L. Sun, F. G. Germain, and G. J. Mysore, Combining modeling of singing voice and background music for automatic separation of musical mixtures, 14th International Society for Music Information Retrieval Conference, 2013.

N. Boulanger-lewandowski, G. J. Mysore, and M. Hoffman, Exploiting long-term temporal dependencies in NMF using recurrent neural networks with application to source separation, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.
DOI : 10.1109/ICASSP.2014.6854951

G. J. Mysore, P. Smaragdis, and B. Raj, Non-negative Hidden Markov Modeling of Audio with Application to Source Separation, 9th International Conference on Latent Variable Analysis and Signal Separation, 2010.
DOI : 10.1007/978-3-642-15995-4_18

K. Qian, Y. Zhang, S. Chang, X. Yang, D. Florêncio et al., Speech enhancement using bayesian wavenet Deep learning: Methods and applications, Proc. Interspeech 2017, pp.3-4, 2013.
DOI : 10.21437/interspeech.2017-1672

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.9, issue.7553, pp.436-444, 2015.
DOI : 10.1007/s10994-013-5335-x

H. Robbins and S. Monro, A Stochastic Approximation Method, The Annals of Mathematical Statistics, vol.22, issue.3, pp.400-407, 1951.
DOI : 10.1214/aoms/1177729586

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.85, issue.6088, pp.533-536, 1986.
DOI : 10.1038/323533a0

M. Hermans and B. Schrauwen, Training and analysing deep recurrent neural networks, 26th International Conference on Neural Information Processing Systems, 2013.

R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, How to construct deep recurrent neural networks, International Conference on Learning Representations, 2014.

P. Huang, M. Kim, M. Hasegawa-johnson, and P. Smaragdis, Joint optimization of masks and deep recurrent neural networks for monaural source separation, Speech, and Language Processing, 2015.

S. Uhlich, F. Giron, and Y. Mitsufuji, Deep neural network based instrument extraction from music, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7178348

S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp et al., Improving music source separation based on deep neural networks through data augmentation and network blending, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
DOI : 10.1109/ICASSP.2017.7952158

A. J. Simpson, G. Roma, and M. D. Plumbley, Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network, 12th International Conference on Latent Variable Analysis and Signal Separation, 2015.
DOI : 10.1007/978-3-319-22482-4_50

J. Schlüter, Learning to pinpoint singing voice from weakly labeled examples, 17th International Society for Music Information Retrieval Conference, 2016.

P. Chandna, M. Miron, J. Janer, and E. Gómez, Monoaural Audio Source Separation Using Deep Convolutional Neural Networks, 13th International Conference on Latent Variable Analysis and Signal Separation, 2017.
DOI : 10.1109/TASLP.2014.2352935

S. I. Mimilakis, E. Cano, J. Abeßer, and G. Schuller, New sonorities for jazz recordings: Separation and mixing using deep neural networks, 2nd AES Workshop on Intelligent Music Production, 2016.

S. I. Mimilakis, K. Drossos, T. Virtanen, and G. Schuller, A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), 2017.
DOI : 10.1109/MLSP.2017.8168117

S. I. Mimilakis, K. Drossos, J. , F. Santos, G. Schuller et al., Monaural singing voice separation with skip-filtering connections and recurrent inference of time-frequency mask, IEEE International Conference on Acoustics, Speech and Signal Processing, 2018.

A. Jansson, E. Humphrey, N. Montecchio, R. Bittner, A. Kumar et al., Singing voice separation with deep U-Net convolutional networks, 18th International Society for Music Information Retrieval Conferenceng, 2017.

N. Takahashi and Y. Mitsufuji, Multi-Scale multi-band densenets for audio source separation, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017.
DOI : 10.1109/WASPAA.2017.8169987
URL : http://arxiv.org/pdf/1706.09588

J. R. Hershey, Z. Chen, J. L. Roux, and S. Watanabe, Deep clustering: Discriminative embeddings for segmentation and separation, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
DOI : 10.1109/ICASSP.2016.7471631
URL : http://arxiv.org/pdf/1508.04306

Y. Isik, J. L. Roux, Z. Chen, S. Watanabe, and J. R. Hershey, Singlechannel multispeaker separation using deep clustering, Interspeech, 2016.

Y. Luo, Z. Chen, J. R. Hershey, J. L. Roux, and N. Mesgarani, Deep clustering and conventional networks for music separation: Stronger together, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
DOI : 10.1109/ICASSP.2017.7952118

M. Kim and P. Smaragdis, Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures, 12th International Conference on Latent Variable Analysis and Signal Separation, 2015.
DOI : 10.1007/978-3-319-22482-4_12

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of Machine Learning Research, vol.11, pp.3371-3408, 2010.

E. M. Grais, G. Roma, A. J. Simpson, and M. D. Plumbley, Single channel audio source separation using deep neural network ensembles, 140th AES Convention, 2016.

S. Nie, W. Xue, S. Liang, X. Zhang, W. Liu et al., Joint optimization of recurrent networks exploiting source auto-regression for source separation, Interspeech, 2015.

J. Sebastian and H. A. Murthy, Group delay based music source separation using deep recurrent neural networks, 2016 International Conference on Signal Processing and Communications (SPCOM), 2016.
DOI : 10.1109/SPCOM.2016.7746672

B. Yegnanarayana, H. A. Murthy, and V. R. Ramachandran, Processing of noisy speech using modified group delay functions, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991.
DOI : 10.1109/ICASSP.1991.150496

Z. Fan, J. R. Jang, and C. Lu, Singing Voice Separation and Pitch Extraction from Monaural Polyphonic Audio Music via DNN and Adaptive Pitch Tracking, 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), 2016.
DOI : 10.1109/BigMM.2016.56

C. Avendano, Frequency-domain source identification and manipulation in stereo mixes for enhancement, suppression and re-panning applications, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684), 2003.
DOI : 10.1109/ASPAA.2003.1285818

C. Avendano and J. Jot, Frequency domain techniques for stereo to multichannel upmix, AES 22nd International Conference, 2002.

D. Barry, B. Lawlor, and E. Coyle, Sound source separation: Azimuth discrimination and resynthesis, 7th International Conference on Digital Audio Effects, 2004.

M. Vinyes, J. Bonada, and A. Loscos, Demixing commercial music productions via human-assisted time-frequency masking, 120th AES Convention, 2006.

M. Cobos, J. J. López, and S. Rickard, Stereo audio source separation based on time???frequency masking and multilevel thresholding, Digital Signal Processing, vol.18, issue.6, pp.960-976, 2004.
DOI : 10.1016/j.dsp.2008.06.004

N. Otsu, A Threshold Selection Method from Gray-Level Histograms, IEEE Transactions on Systems, Man, and Cybernetics, vol.9, issue.1, pp.62-66, 1979.
DOI : 10.1109/TSMC.1979.4310076

S. Sofianos, A. Ariyaeeinia, and R. Polfreman, Towards effective singing voice extraction from stereophonic recordings, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010.
DOI : 10.1109/ICASSP.2010.5496004
URL : http://uhra.herts.ac.uk/bitstream/2299/10268/1/904400.pdf

S. Sofianos, A. Ariyaeeinia, R. Polfreman, and R. Sotudeh, Hsemantics: A hybrid approach to singing voice separation, Journal of the Audio Engineering Society, vol.60, issue.10, pp.831-841, 2012.

M. Kim, S. Beack, K. Choi, and K. Kang, Gaussian mixture model for singing voice separation from stereophonic music, AES 43rd Conference, 2011.

M. Cobos and J. J. López, Singing voice separation combining panning information and pitch tracking Stereo vocal extraction using ADRess and nearest neighbours median filtering, AES 124th Convention 16th International Conference on Digital Audio Effects, 2008.

D. Fitzgerald and R. Jaiswal, Improved Stereo Instrumental Track Recovery using Median Nearest-Neighbour Inpainting, 24th IET Irish Signals and Systems Conference (ISSC 2013), 2013.
DOI : 10.1049/ic.2013.0026

A. Adler, V. Emiya, M. G. Jafari, M. Elad, R. Gribonval et al., Audio Inpainting, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.3, pp.922-932, 2012.
DOI : 10.1109/TASL.2011.2168211
URL : https://hal.archives-ouvertes.fr/inria-00577079

A. Ozerov and C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures. With application to blind audio source separation, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.
DOI : 10.1109/ICASSP.2009.4960289

A. Ozerov, C. Févotte, R. Blouet, and J. Durrieu, Multichannel nonnegative tensor factorization with structured constraints for userguided audio source separation, IEEE International Conference on Acoustics, Speech and Signal Processing, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00564851

A. Liutkus, R. Badeau, and G. Richard, Informed Source Separation Using Latent Components, 9th International Conference on Latent Variable Analysis and Signal Separation, 2010.
DOI : 10.1007/978-3-642-15995-4_62
URL : https://hal.archives-ouvertes.fr/hal-00945298

C. Févotte and A. Ozerov, Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues, 7th International Symposium on Computer Music Modeling and Retrieval, 2010.
DOI : 10.1109/TASL.2006.885253

A. Ozerov, N. Duong, and L. Chevallier, On monotonicity of multiplicative update rules for weighted nonnegative tensor factorization, International Symposium on Nonlinear Theory and its Applications, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016315

H. Sawada, H. Kameoka, S. Araki, and N. Ueda, New formulations and efficient algorithms for multichannel NMF, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011.
DOI : 10.1109/ASPAA.2011.6082275

S. Sivasankaran, A. A. Nugraha, E. Vincent, J. A. Cordovilla, S. Dalmia et al., Robust ASR using neural network based speech enhancement and feature simulation, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015.
DOI : 10.1109/ASRU.2015.7404834
URL : https://hal.archives-ouvertes.fr/hal-01204553

A. A. Nugraha, A. Liutkus, and E. Vincent, Multichannel Audio Source Separation With Deep Neural Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.24, issue.9, pp.1652-1664, 2016.
DOI : 10.1109/TASLP.2016.2580946
URL : https://hal.archives-ouvertes.fr/hal-01163369

N. Q. Duong, E. Vincent, and R. Gribonval, Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.7, pp.1830-1840, 2010.
DOI : 10.1109/TASL.2010.2050716
URL : https://hal.archives-ouvertes.fr/inria-00435807

A. Ozerov, A. Liutkus, R. Badeau, and G. Richard, Informed source separation: Source coding meets source separation, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011.
DOI : 10.1109/ASPAA.2011.6082285
URL : https://hal.archives-ouvertes.fr/inria-00610526

E. Zwicker and H. Fastl, Psychoacoustics: Facts and models, 2013.
DOI : 10.1007/978-3-662-09562-1

A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001.
DOI : 10.1109/ICASSP.2001.941023

Z. Wang and A. C. Bovik, Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures, IEEE Signal Processing Magazine, vol.26, issue.1, pp.98-117, 2009.
DOI : 10.1109/MSP.2008.930649

J. Barker, R. Marxer, E. Vincent, and S. Watanabe, The third ???CHiME??? speech separation and recognition challenge: Dataset, task and baselines, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015.
DOI : 10.1109/ASRU.2015.7404837
URL : https://hal.archives-ouvertes.fr/hal-01211376

I. Recommendation, Bs. 1534-1. method for the subjective assessment of intermediate sound quality (MUSHRA), 2001.

E. Vincent, M. Jafari, and M. Plumbley, Preliminary guidelines for subjective evaluation of audio source separation algorithms, ICA Research Network International Workshop, 2006.

E. Cano, C. Dittmar, and G. Schuller, Influence of phase, magnitude and location of harmonic components in the perceived quality of extracted solo signals, AES 42nd Conference on Semantic Audio, 2011.

C. Févotte, R. Gribonval, and E. Vinvent, BSS EVAL toolbox user guide -revision 2.0, IRISA, Tech. Rep, 2005.

E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.4, pp.1462-1469, 2006.
DOI : 10.1109/TSA.2005.858005
URL : https://hal.archives-ouvertes.fr/inria-00544230

B. Fox, A. Sabin, B. Pardo, and A. Zopf, Modeling Perceptual Similarity of Audio Signals for Blind Source Separation Evaluation, 7th International Conference on Latent Variable Analysis and Signal Separation, 2007.
DOI : 10.1007/978-3-540-74494-8_57

B. Fox and B. Pardo, Towards a Model of Perceived Quality of Blind Audio Source Separation, Multimedia and Expo, 2007 IEEE International Conference on, 2007.
DOI : 10.1109/ICME.2007.4285046

J. Kornycky, B. Gunel, and A. Kondoz, Comparison of subjective and objective evaluation methods for audio source separation, Journal of the Acoustical Society of America, vol.4, issue.1, 2008.

V. Emiya, E. Vincent, N. Harlander, and V. Hohmann, Multi-criteria subjective and objective evaluation of audio source separation, 38th International AES Conference, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00545031

E. Vincent, Improved Perceptual Metrics for the Evaluation of Audio Source Separation, 10th International Conference on Latent Variable Analysis and Signal Separation, 2012.
DOI : 10.1016/j.sigpro.2011.10.007
URL : https://hal.archives-ouvertes.fr/hal-00653196

M. Cartwright, B. Pardo, G. J. Mysore, and M. Hoffman, Fast and easy crowdsourced perceptual audio evaluation, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
DOI : 10.1109/ICASSP.2016.7471749

U. Gupta, E. Moore, and A. Lerch, On the perceptual relevance of objective source separation measures for singing voice separation, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2005.
DOI : 10.1109/WASPAA.2015.7336923

F. Stöter, A. Liutkus, R. Badeau, B. Edler, and P. Magron, Common fate model for unison source separation, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
DOI : 10.1109/ICASSP.2016.7471650

G. Roma, E. M. Grais, A. J. Simpson, I. Sobieraj, and M. D. Plumbley, Untwist: A new toolbox for audio source separation, 17th International Society on Music Information Retrieval Conference, 2016.