Image method for efficiently simulating small-room acoustics, The Journal of the Acoustical Society of America, vol.65, issue.4, pp.943-950, 1979. ,
Deep speech 2: End-to-end speech recognition in English and Mandarin, Proc. Intl. Conference on Machine Learning, pp.173-182, 2016. ,
Detecting overlapped speech on short timeframes using deep learning, Proc. Interspeech Conf, 2017. ,
Counting competing speakers in a time frame -human versus computer, Proc. Interspeech Conf, 2015. ,
Speaker diarization: A review of recent research, IEEE Trans. Audio, Speech, Lang. Process, vol.20, issue.2, pp.356-370, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00733397
Estimating number of speakers by the modulation characteristics of speech, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, vol.2, p.197, 2003. ,
Stereo source separation and source counting with map estimation with dirichlet prior considering spatial aliasing problem, Proc. Intl. Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), pp.742-750, 2009. ,
A robust method to count and locate audio sources in a multichannel underdetermined mixture, IEEE Trans. Signal Process, vol.58, issue.1, pp.121-133, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00305435
Counting in the wild, European Conference on Computer Vision, pp.483-498, 2016. ,
Statistical Decision Theory and Bayesian Analysis, 1985. ,
Algorithms for hyper-parameter optimization, Advances in neural information processing systems, pp.2546-2554, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00642998
Two's a crowd: Improving speaker diarization by automatically identifying and excluding overlapped speech, Proc. Interspeech Conf, 2008. ,
Crowdnet: A deep convolutional network for dense crowd counting, Proc. ACM Intl. Conference on Multimedia (ACMMM), pp.640-644, 2016. ,
Auditory scene analysis: The perceptual organization of sound, 1994. ,
Bayesian poisson regression for crowd counting, Proc. IEEE Intl. Conference on Computer Vision (ICCV), pp.545-551, 2009. ,
Counting everyday objects in everyday scenes, Proc. Intl. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017. ,
Convolutional recurrent neural networks for music classification, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.2392-2396, 2017. ,
On the medians of gamma distributions and an equation of ramanujan, Proceedings of the, vol.121, pp.245-251, 1994. ,
, , 2015.
End-to-end learning for music audio, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.6964-6968, 2014. ,
Source counting in speech mixtures using a variational EM approach for complex watson mixture models, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.6834-6838, 2014. ,
Nonlinear poisson regression using neural networks: a simulation study, Neural Computing and Applications, vol.18, issue.8, p.939, 2009. ,
Speaker diarization using deep neural network embeddings, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.4930-4934, 2017. ,
, DARPA TIMIT acoustic phonetic continuous speech corpus CDROM, 1993.
Detecting overlapping speech with long short-term memory recurrent neural networks, Proc. Interspeech Conf, pp.1668-1672, 2013. ,
Single channel audio source separation using convolutional denoising autoencoders, Proc. GlobalSIP, pp.1265-1269, 2017. ,
The ETAPE Corpus for the Evaluation of Speech-based TV Content Processing in the French Language, LREC -Eighth international conference on Language Resources and Evaluation, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00712591
Room impulse response (RIR) generator, 2016. ,
Enhancing lstm rnn-based speech overlap detection by artificially mixed data, Proc. Audio Eng. Soc. Conference on Semantic Audio, 2017. ,
Deep clustering: Discriminative embeddings for segmentation and separation, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.31-35, 2016. ,
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012. ,
The vanishing gradient problem during learning recurrent neural nets and problem solutions, Int. J. Uncertain. Fuzziness Knowl.-Based Syst, vol.6, issue.2, pp.107-116, 1998. ,
Long short-term memory, Neural Comput, vol.9, issue.8, pp.1735-1780, 1997. ,
Convolutional neural network in the task of speaker change detection, International Conference on Speech and Computer, pp.191-198, 2016. ,
Speech overlap detection in a two-pass speaker diarization system, Proc. Interspeech Conf, 2009. ,
Voice denumerability in polyphonic music of homogeneous timbres, An Interdisciplinary Journal, vol.6, issue.4, pp.361-382, 1989. ,
Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models, Journal of memory and language, vol.59, issue.4, pp.434-446, 2008. ,
The power of numerical discrimination, Nature, vol.3, issue.67, pp.281-282, 1871. ,
Online speaking rate estimation using recurrent neural networks, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.5245-5249, 2016. ,
One, two, many -judging the number of concurrent talkers, J. Acoust. Soc. Am, vol.99, issue.4, pp.2596-2603, 1996. ,
Perceptual limits in a simulated cocktail party. Attention, Perception and Psychophysics, vol.77, pp.2108-2120, 2015. ,
Deep convolutional neural networks for human embryonic cell counting, European Conference on Computer Vision, pp.339-348, 2016. ,
Adam: A method for stochastic optimization, Proc. ICLR, 2014. ,
Itakura-saito nonnegative matrix factorization with group sparsity, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.21-24, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00567344
Singing voice detection with deep recurrent neural networks, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.121-125, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01110035
Source number estimation and clustering for underdetermined blind source separation, Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC), 2008. ,
Fully convolutional crowd counting on highly congested scenes, 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2017. ,
Generalized Linear Mixed Models, 2006. ,
DCASE 2017 challenge setup: Tasks, datasets and baseline system, DCASE 2017-Workshop on Detection and Classification of Acoustic Scenes and Events, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01627981
TUT database for acoustic scene classification and sound event detection, Proc. European Signal Processing Conf. (EUSIPCO), 2016. ,
Blind audio source counting and separation of anechoic mixtures using the multichannel complex NMF framework, Signal Processing, vol.115, pp.27-37, 2015. ,
A cross cultural study of speech rate, Language and Speech, vol.7, issue.2, pp.120-125, 1964. ,
Librispeech: An ASR corpus based on public domain audio books, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.5206-5210, 2015. ,
Blind speaker counting in highly reverberant environments by clustering coherence features, Asia-Pacific Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017. ,
Source counting in real-time sound source localization using a circular microphone array, IEEE Signal Processing Workshop on Sensor Array and Multichannel (SAM), pp.521-524, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00772688
Handbook of Blind Source Separation, 2010. ,
Experimenting with musically motivated convolutional neural networks, Intl. Workshop on Content-Based Multimedia Indexing (CBMI), pp.1-6, 2016. ,
Timbre analysis of music audio signals with convolutional neural networks, Proc. European Signal Processing Conf. (EUSIPCO), 2017. ,
, Speaker diarization system using HXLPS and deep neural network. Alexandria Engineering Journal, 2017.
DeepSetNet: Predicting sets with deep neural networks, Proc. IEEE Intl. Conference on Computer Vision (ICCV, 2017. ,
Speaker diarization through speaker embeddings, Proc. European Signal Processing Conf. (EUSIPCO), pp.2082-2086, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01194233
An open-source state-of-the-art toolbox for broadcast news diarization, Proc. Interspeech Conf, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01433449
Convolutional, long short-term memory, fully connected deep neural networks, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.4580-4584, 2015. ,
Proposal of a new confidence parameter estimating the number of speakers-an experimental investigation, Journal of Information Hiding and Multimedia Signal Processing, vol.1, issue.2, pp.101-109, 2010. ,
Learning to pinpoint singing voice from weakly labeled examples, Proc. Intl. Society for Music Information Retrieval Conference (ISMIR), pp.44-50, 2016. ,
Exploring data augmentation for improved singing voice detection with neural networks, Proc. Intl. Society for Music Information Retrieval Conference (ISMIR), pp.121-126, 2015. ,
An experiment about estimating the number of instruments in polyphonic music: a comparison between internet and laboratory results, Proc. Intl. Society for Music Information Retrieval Conference (ISMIR), pp.389-394, 2013. ,
Learning to count with deep object features, Proc. Intl. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp.90-96, 2015. ,
An investigation of deep neural networks for noise robust speech recognition, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.7398-7402, 2013. ,
Teager-kaiser energy operators for overlapped speech detection, IEEE/ACM Trans. Audio, Speech, Lang. Process, vol.25, issue.5, pp.1035-1047, 2017. ,
Deep inside convolutional networks: Visualising image classification models and saliency maps, CoRR, 2013. ,
Very deep convolutional networks for large-scale image recognition, Proc. ICLR, 2015. ,
Striving for simplicity: The all convolutional net. CoRR, abs/1412, vol.6806, 2014. ,
Human ability of counting the number of instruments in polyphonic music, Proceedings of Meetings on Acoustics, vol.19, 2013. ,
Classification vs. regression in supervised learning for single channel speaker count estimation, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, 2018. ,
Effect on numerosity judgment of grouping of tones by auditory channels. Attention, Perception, & Psychophysics, vol.26, pp.374-380, 1979. ,
Deep neural network based instrument extraction from music, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.2135-2139, 2015. ,
Source counting in speech mixtures by nonparametric bayesian estimation of an infinite Gaussian mixture model, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp.459-463, 2015. ,
Deep people counting in extremely dense crowds, Proc. ACM Intl. Conference on Multimedia (ACMMM), pp.1299-1302, 2015. ,
THCHS-30 : A free chinese speech corpus, 2015. ,
Crowd++: Unsupervised speaker count with smartphones, Proc. of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, pp.43-52, 2013. ,
Artificial neural network features for speaker diarization, IEEE Workshop on Spoken Language Technology (SLT), pp.402-406, 2014. ,
Speaker change detection in broadcast TV using bidirectional long short-term memory networks, Proc. Interspeech Conf. ISCA, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01690244
Permutation invariant training of deep models for speakerindependent multi-talker speech separation, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, 2017. ,
Cross-scene crowd counting via deep convolutional neural networks, Proc. Intl. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp.833-841, 2015. ,
Salient object subitizing, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CCVPR), pp.4045-4054, 2015. ,
Convolutional recurrent neural networks for polyphonic sound event detection, IEEE/ACM Trans. Audio, Speech, Lang. Process, vol.25, issue.6, pp.1291-1303, 2017. ,