context-aware sensor; biomimetic acoustic system; deep learning networks; low-power cochlea sensor; event-driven computing
Chen Xi, Gao Chang, Delbruck Tobi, LiuShih-Chii (2021), EILE: Efficient Incremental Learning on the Edge, in
2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), Washington DC, DC, USAIEEE, USA.
Gao Chang, Rios-Navarro Antonio, Chen Xi, Liu Shih-Chii, Delbruck Tobi (2020), EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference, in
IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 10(4), 419-432.
GaoChang, Rios-NavarroAntonio, ChenXi, DelbruckTobi, LiuShih-Chii (2020), EdgeDRNN: Enabling low-latency recurrent neural network edge inference, in
Proceedings of IEEE Artificial Intelligence Circuits and Systems, VirtualIEEE, USA.
Ceolini Enea, Kiselev Ilya, Liu Shih-Chii (2020), Evaluating multi-channel multi-device speech separation algorithms in the wild: a hardware-software solution, in
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 1-1.
Li Xiaoya, Neil Daniel, Delbruck Tobi, Liu Shih-Chii (2019), Lip Reading Deep Network Exploiting Multi-Modal Spiking Visual and Auditory Sensors, in
2019 IEEE International Symposium on Circuits and Systems, IEEE, USA.
Ceolini Enea, Kiselev Ilya, Liu Shih-Chii (2019), Audio classification systems using deep neural networks and an event-driven auditory sensor, in
Proceedings of IEEE Sensors, Montreal, CanadaIEEE, USA.
Ceolini Enea, Liu Shih-Chii (2019), Combining deep neural networks and beamforming for real-time multi-channel speech enhancement using a wireless acoustic sensor network, in
Proceedings of IEEE International Workshop on Machine Learning for Speech Processing (MLSP 2019), Pittsburgh, USAIEEE, USA.
Ceolini Enea, Anumula Jithendar, Braun Stefan, Liu Shih-Chii (2019), Event-driven pipeline for low latency low compute keyword spotting and speaker verification system, in
2019 Proc of International Conference on Acoustics, Speech, and Signal Processing, Brighton, UKIEEE, USA.
Luo Yi, Ceolini Enea, Han C, Liu Shih-Chii, Mesgarani Nima (2019), FaSNet: Low-latency adaptive beamforming for multi-microphone audio processing, in
2019 IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, Sentosa, SingaporeIEEE, USA.
Gao Chang, Braun Stefan, Kiselev Ilya, Anumula Jithendar, DelbruckTobi, LiuShih-Chii (2019), Real-time speech recognition for IoT purpose using a delta recurrent neural network accelerator, in
2019 Proc of the International Symposium on Circuits and Systems, JapanIEEE, USA.
Anumula Jithendar, Ceolini Enea, He Zhe, Huber Adrian, Liu Shih-Chii (2018), An event-driven probabilistic model of sound source localization using cochlea spikes, in
2018 IEEE International Symposium on Circuits and Systems (ISCAS), FlorenceIEEE, USA.
Author |
Ceolini, Enea; Kiselev, Ilya; Liu, Shih-Chii |
Publication date |
01.05.2020 |
Persistent Identifier (PID) |
https://doi.org/10.5281/zenodo.3688540 |
Repository |
WHISPER SET 1
|
Abstract |
This dataset is WHISPER SET 1, a dataset for speech enhancement and source separation recorded with a Wireless Acoustic Sensor Network (WASN) called WHISPER Kiselev2018. The dataset contains samples for up to 4 concurrent speakers and speech in noise. The dataset was recorded in a room with low reverberation (T_60 = 0.2 s) and using 16 microphones. In general, each track contains first a calibration phase where each of the speakers sequentially is active alone for 15 seconds. Followed by 15 seconds of all the speakers together (plus noise in some cases).
Monitoring spaces on a restricted power budget is critical for fields such as ambient intelligence, wireless sensor networks and Internet of Things (IoT). Smarter always-on acoustic or vision or multi-modal sensors that move more intelligence to the edge will alleviate the expensive wireless energy requirements needed for this goal. Portable platforms that can do cheaper computation such as identifying the voice of a particular speaker can then stream the extracted features or preprocessed audio to the cloud for the more computationally expensive processing, e.g. understanding the speaker’s speech. Current state-of-art approaches use deep neural networks which have achieved state-of-art results on many benchmark tasks. Because the networks are computationally expensive, they are typically computed on the cloud. In general, these networks only perform well when they are pre-trained with large databases that include the input statistics under which the device is deployed. Retraining networks is not feasible to include knowledge of the acoustic conditions under which the device is deployed because of the lengthy training time even with powerful GPUs. Currently, few methods exist to include online learning on the platform. This proposal (HEAR-EAR) will investigate novel methods for reducing the amount of computation needed by the network during inference and adaptive learning so that the network can run on the portable platform. We investigate in particular event-driven deep learning computing architectures which are driven by event-driven sensors. These sensors naturally capture the timing information of events in the world thus allowing easy binding of temporal stimulus-related information. HEAR-EAR will also investigate new hardware architectures for event-driven deep networks that capitalizes on this information. These networks together with the latest low-power event-driven cochlea, will be used in the development of a novel adaptive event-driven low power acoustic processing system.