Project

Back to overview

HEAR-EAR

Applicant Liu Shih-Chii
Number 172553
Funding scheme Project funding
Research institution Institut für Neuroinformatik Universität Zürich Irchel und ETH Zürich
Institution of higher education University of Zurich - ZH
Main discipline Information Technology
Start/End 01.04.2018 - 30.09.2021
Approved amount 474'372.00
Show all

All Disciplines (2)

Discipline
Information Technology
Electrical Engineering

Keywords (5)

context-aware sensor; biomimetic acoustic system; deep learning networks; low-power cochlea sensor; event-driven computing

Lay Summary (German)

Lead
Das Verstehen und Interpretieren von akustischen Informationen ist eine wichtige ingenieurwissenschaftliche Aufgabe. Moderne maschinelle Erkennungssysteme scheitern jedoch oftmals in realistischen Umgebungen mit zusätzlichem Rauschen, natürlichem Raumhall und mehreren Sprechern, selbst wenn die Erkennung in leistungsfähigen Rechenzentren ausgeführt wird. Aufgrund des enormen Bedarfs an Rechenleistung sind moderne Algorithmen auf mobilen Plattformen mit niedrigem Stromverbrauch kaum einsetzbar, wenn Echtzeitfähigkeit vorausgesetzt wird. Die Entwicklung kontextsensitiver, adaptiver Sensor-Systeme für den Einsatz in realistischen Umgebungen bei niedrigem Stromverbrauch ist von entscheidender Bedeutung für Einsatzgebiete wie Umgebungsintelligenz, drahtlose Sensornetzwerke und das Internet der Dinge.
Lay summary

Das Ziel dieses Projekts ist die Entwicklung einer kontextsensitiven, ereignis-basierten akustischen Sensortechnologie, die dazu in der Lage ist, in natürlichen Umgebungen wie Wohnraumen oder wissenschaftlichen Versuchsaufbauten sowohl intelligente Wahrnehmung als auch ständigen Betrieb zu ermöglichen. Dazu sollen im Stromverbrauch sparsame Silizium-Sensoren, die Daten ähnlich wie biologische Sensoren in asynchroner, ereignis-basierter Weise aufnehmen, mit einem Prozessor kombiniert werden, welcher die Daten mithilfe von ereignis-basierten Methoden der Neuroinformatik verarbeitet. Der ereignis-gesteuerte Betrieb des Systems basiert auf der ebenfalls ereignis-basierten Informationsverarbeitung des Gehirns, welches in der Wahrnehmung komplexer Umgebungen gegenüber maschinellen Systemen mit gleicher Leistungsaufnahme deutlich überlegen ist.

 

Direct link to Lay Summary Last update: 12.03.2018

Responsible applicant and co-applicants

Employees

Project partner

Publications

Publication
EILE: Efficient Incremental Learning on the Edge
Chen Xi, Gao Chang, Delbruck Tobi, LiuShih-Chii (2021), EILE: Efficient Incremental Learning on the Edge, in 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), Washington DC, DC, USAIEEE, USA.
EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference
Gao Chang, Rios-Navarro Antonio, Chen Xi, Liu Shih-Chii, Delbruck Tobi (2020), EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference, in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 10(4), 419-432.
EdgeDRNN: Enabling low-latency recurrent neural network edge inference
GaoChang, Rios-NavarroAntonio, ChenXi, DelbruckTobi, LiuShih-Chii (2020), EdgeDRNN: Enabling low-latency recurrent neural network edge inference, in Proceedings of IEEE Artificial Intelligence Circuits and Systems, VirtualIEEE, USA.
Evaluating multi-channel multi-device speech separation algorithms in the wild: a hardware-software solution
Ceolini Enea, Kiselev Ilya, Liu Shih-Chii (2020), Evaluating multi-channel multi-device speech separation algorithms in the wild: a hardware-software solution, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 1-1.
Lip Reading Deep Network Exploiting Multi-Modal Spiking Visual and Auditory Sensors
Li Xiaoya, Neil Daniel, Delbruck Tobi, Liu Shih-Chii (2019), Lip Reading Deep Network Exploiting Multi-Modal Spiking Visual and Auditory Sensors, in 2019 IEEE International Symposium on Circuits and Systems, IEEE, USA.
Audio classification systems using deep neural networks and an event-driven auditory sensor
Ceolini Enea, Kiselev Ilya, Liu Shih-Chii (2019), Audio classification systems using deep neural networks and an event-driven auditory sensor, in Proceedings of IEEE Sensors, Montreal, CanadaIEEE, USA.
Combining deep neural networks and beamforming for real-time multi-channel speech enhancement using a wireless acoustic sensor network
Ceolini Enea, Liu Shih-Chii (2019), Combining deep neural networks and beamforming for real-time multi-channel speech enhancement using a wireless acoustic sensor network, in Proceedings of IEEE International Workshop on Machine Learning for Speech Processing (MLSP 2019), Pittsburgh, USAIEEE, USA.
Event-driven pipeline for low latency low compute keyword spotting and speaker verification system
Ceolini Enea, Anumula Jithendar, Braun Stefan, Liu Shih-Chii (2019), Event-driven pipeline for low latency low compute keyword spotting and speaker verification system, in 2019 Proc of International Conference on Acoustics, Speech, and Signal Processing, Brighton, UKIEEE, USA.
FaSNet: Low-latency adaptive beamforming for multi-microphone audio processing
Luo Yi, Ceolini Enea, Han C, Liu Shih-Chii, Mesgarani Nima (2019), FaSNet: Low-latency adaptive beamforming for multi-microphone audio processing, in 2019 IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, Sentosa, SingaporeIEEE, USA.
Real-time speech recognition for IoT purpose using a delta recurrent neural network accelerator
Gao Chang, Braun Stefan, Kiselev Ilya, Anumula Jithendar, DelbruckTobi, LiuShih-Chii (2019), Real-time speech recognition for IoT purpose using a delta recurrent neural network accelerator, in 2019 Proc of the International Symposium on Circuits and Systems, JapanIEEE, USA.
An event-driven probabilistic model of sound source localization using cochlea spikes
Anumula Jithendar, Ceolini Enea, He Zhe, Huber Adrian, Liu Shih-Chii (2018), An event-driven probabilistic model of sound source localization using cochlea spikes, in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), FlorenceIEEE, USA.

Datasets

WHISPER SET 1: a dataset for multi-channel, multi-device speech separation and speech enhancement

Author Ceolini, Enea; Kiselev, Ilya; Liu, Shih-Chii
Publication date 01.05.2020
Persistent Identifier (PID) https://doi.org/10.5281/zenodo.3688540
Repository WHISPER SET 1
Abstract
This dataset is WHISPER SET 1, a dataset for speech enhancement and source separation recorded with a Wireless Acoustic Sensor Network (WASN) called WHISPER Kiselev2018. The dataset contains samples for up to 4 concurrent speakers and speech in noise. The dataset was recorded in a room with low reverberation (T_60 = 0.2 s) and using 16 microphones. In general, each track contains first a calibration phase where each of the speakers sequentially is active alone for 15 seconds. Followed by 15 seconds of all the speakers together (plus noise in some cases).

Collaboration

Group / person Country
Types of collaboration
Nima Mesgarani/Columbia University United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
2021 IEEE Solid-state Circuits Webinar Individual talk Event-driven low-compute bio-inspired processing for edge audio devices 27.09.2021 Virtual, United States of America Liu Shih-Chii;
2020 IEEE International Solid-state Circuits Conference Talk given at a conference TinyML for Audio-based Applications 16.02.2020 San Francisco, United States of America Liu Shih-Chii;
2020 TinyML Summit Talk given at a conference TinyML Audio Algorithms 12.02.2020 San Jose, United States of America Liu Shih-Chii;
2019 IEEE Sensors Conference Talk given at a conference Audio classification systems using deep neural networks and an event-driven auditory sensor 27.10.2019 Montreal, Canada Ceolini Enea; Liu Shih-Chii;
IEEE CIrcuits and Systems Society Singapore Lecture and Virtus, Center of Excellence Seminar Individual talk Neuromorphic audition 09.10.2019 Singapore, Singapore Liu Shih-Chii;
Logitech Technology Summit on Audio Processing and Embedded Machine Learning Individual talk Neuromorphic audition 11.09.2019 Lausanne, Switzerland Liu Shih-Chii;
2019 Interdisciiplinary College: Theme: Out of your senses Individual talk Set of lectures on neuromorphic technology 12.03.2019 Guenne/Mohnesee, Germany Liu Shih-Chii;
Electrical and Computer Engineering Seminar Series Individual talk Classification using event-driven sensors and machine learning deep neural networks 12.12.2018 Univ of Florida, Gainsville, United States of America Liu Shih-Chii;


Knowledge transfer events

Active participation

Title Type of contribution Date Place Persons involved
NZZ X-Days 2019 Talk 27.03.2019 Interlaken, Switzerland Liu Shih-Chii;


Awards

Title Year
2020 IEEE Artificial Intelligence Circuits and Systems Best Paper Award 2020
Distinction for PhD thesis of Enea Ceolini from Faculty of Science at University of Zurich 2020
Misha Mahowald Prize for Neuromorphic Engineering 2020

Associated projects

Number Title Start Funding scheme
153565 Fast Separation of Auditory Sounds 01.04.2014 Project funding
177255 WeCare: Cognitive-Multisensing Wearable Sweat Biomonitoring Technology for Real-Time Personalized Diagnosis and Preventive Health Care 01.06.2018 Sinergia

Abstract

Monitoring spaces on a restricted power budget is critical for fields such as ambient intelligence, wireless sensor networks and Internet of Things (IoT). Smarter always-on acoustic or vision or multi-modal sensors that move more intelligence to the edge will alleviate the expensive wireless energy requirements needed for this goal. Portable platforms that can do cheaper computation such as identifying the voice of a particular speaker can then stream the extracted features or preprocessed audio to the cloud for the more computationally expensive processing, e.g. understanding the speaker’s speech. Current state-of-art approaches use deep neural networks which have achieved state-of-art results on many benchmark tasks. Because the networks are computationally expensive, they are typically computed on the cloud. In general, these networks only perform well when they are pre-trained with large databases that include the input statistics under which the device is deployed. Retraining networks is not feasible to include knowledge of the acoustic conditions under which the device is deployed because of the lengthy training time even with powerful GPUs. Currently, few methods exist to include online learning on the platform. This proposal (HEAR-EAR) will investigate novel methods for reducing the amount of computation needed by the network during inference and adaptive learning so that the network can run on the portable platform. We investigate in particular event-driven deep learning computing architectures which are driven by event-driven sensors. These sensors naturally capture the timing information of events in the world thus allowing easy binding of temporal stimulus-related information. HEAR-EAR will also investigate new hardware architectures for event-driven deep networks that capitalizes on this information. These networks together with the latest low-power event-driven cochlea, will be used in the development of a novel adaptive event-driven low power acoustic processing system.
-