Project

Back to overview

Fast Separation of Auditory Sounds

Applicant Liu Shih-Chii
Number 153565
Funding scheme Project funding
Research institution Institut für Neuroinformatik Universität Zürich Irchel und ETH Zürich
Institution of higher education University of Zurich - ZH
Main discipline Information Technology
Start/End 01.04.2014 - 31.03.2018
Approved amount 264'822.00
Show all

All Disciplines (2)

Discipline
Information Technology
Electrical Engineering

Keywords (5)

efficient coding; statistical acoustic features; biomimetic acoustic system; natural statistics; source streaming

Lay Summary (German)

Lead
Interpreting acoustic scenes is an important engineering task, reaching high levels of sophistication in speech recognition applications such as recognizing what has been said, who said it or which language they spoke in. Current systems that solve these tasks typically require speech to be acquired by a close-mounted microphone from well-controlled acoustic environments and fail in realistically-changing environments with added noise, natural room reverberations, and competing talkers. By contrast, none of these realistic situations present significant difficulty to human speech communication. The first stages of present engineering systems tend to be off-the-shelf signal processing algorithms, with no sensitivity to global or local sound statistics.
Lay summary

Dieses Projekt wird den Stand der Technik der künstlichen Spracherkennung durch Verwendung von neuer Algorithmen voranbringen. Wir werden ein Echtzeit System entwickeln zur Filterung einer einzelner oder mehrerer auditorischer Quellen. Solch ein System könnte als Handgerät oder in Labors für wissenschaftliche Verhaltensexperimente angewendet werden. Eine mögliche Anwendung wäre z.Bsp. ein Gespräch während eines Aperos aus lauten Hintergrundgeräuschen rausfiltern.

Direct link to Lay Summary Last update: 07.04.2014

Responsible applicant and co-applicants

Employees

Publications

Publication
WHISPER: Wirelessly Synchronized Distributed Audio Sensor Platform
Kiselev Ilya, Ceolini Enea, Wong Daniel, de Cheveigne Alain, Liu Shih-Chii (2017), WHISPER: Wirelessly Synchronized Distributed Audio Sensor Platform, in IEEE SenseApp 2017, SingaporeIEEE, USA.
Monaural source separation using a random forest classifier
Riday Cosimo, Bhargava Saurabh, Hahnloser Richard, Liu Shih-Chii (2016), Monaural source separation using a random forest classifier, in Interspeech, San Francisco, CAInternational Speech Communication Association (ISCA), USA.
A 0.5V 55uW64X2-Channel Binaural Silicon Cochlea for Event-Driven Stereo-Audio Sensing
Yang MinHao, Chien Chen-Han, Delbruck Tobi, Liu Shih-Chii (2016), A 0.5V 55uW64X2-Channel Binaural Silicon Cochlea for Event-Driven Stereo-Audio Sensing, in 2016 IEEE International Solid-State Circuits Conference, San FranciscoIEEE, USA.
Effective sensor fusion with event-based sensors and Deep Network architectures
Neil Daniel, Liu Shih-Chii (2016), Effective sensor fusion with event-based sensors and Deep Network architectures, in International Symposium on Circuits and Systems, IEEE, USA.
Event-driven deep neural network hardware system for sensor fusion
Kiselev Ilya, Neil Daniel, Liu Shih-Chii (2016), Event-driven deep neural network hardware system for sensor fusion, in 2016 IEEE International Symposium on Circuits and Systems, Montreal, CanadaIEEE, USA.
Live Demonstration: Event-Driven Deep Neural Network Hardware System for Sensor Fusion
Kiselev Ilya, Neil Daniel, Liu Shih-Chii (2016), Live Demonstration: Event-Driven Deep Neural Network Hardware System for Sensor Fusion, in Proceedings of the 2016 IEEE International Symposium on Circuits and Systems, Montreal, CanadaIEEE, USA.
Current-Mode Automated Quality Control Cochlear Resonator for Bird Identity Tagging
Moeys Diederik, Delbruck Tobias, Liu Shih-Chii (2015), Current-Mode Automated Quality Control Cochlear Resonator for Bird Identity Tagging, in IEEE International Symposium on Circuits and Systems, Lisbon, PortugalIEEE, USA.
Linear methods for efficient and fast separation of two sources recorded with a single microphone
Bhargava Saurabh, Blaettler Florian, Kollmorgen Sepp, Liu Shih-Chii, Hahnloser Richard H. E. (2015), Linear methods for efficient and fast separation of two sources recorded with a single microphone, in Neural Computation, 27(10), 2231-2259.
Reconstruction of audio waveforms from spike trains of artificial cochlea models
Zai Anja, Bhargava Saurabh, Mesgarani Nima, Liu Shih-Chii (2015), Reconstruction of audio waveforms from spike trains of artificial cochlea models, in Frontiers of Neuroscience, (347), P99.

Collaboration

Group / person Country
Types of collaboration
Shihab Shamma/University of Maryland United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Exchange of personnel

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
IEEE CIrcuits and Systems Distinguished Lecturer Program Individual talk Event-based auditory processing with spiking silicon cochleas and deep networks 16.09.2016 Sevilla, Spain Liu Shih-Chii;
2nd International Conference on Event-Based Control, Communication and Signal Processing Talk given at a conference Silicon cochleas and event-based audio processing 13.06.2016 Krakow, Poland Liu Shih-Chii;
Asia-Pacific Summer School on Bio-Inspired System and Prosthetic Devices Talk given at a conference Event-based sensors, processing algorithms, and networks 25.08.2015 National Dong Hwa University, Taiwan, Taiwan Liu Shih-Chii;
2015 Telluride Neuromorphic Cognition Engineering Workshop Talk given at a conference Silicon cochlea technology and hardware spiking deep belief networks 13.07.2015 Telluride, Colorado, United States of America Liu Shih-Chii;
2015 CapoCaccia Cognitive Neuromorphic Engineering Workshop Individual talk Auditory Processing 27.04.2015 Sardinia, Italy Liu Shih-Chii; Bhargava Saurabh; Hahnloser Richard;


Knowledge transfer events

Active participation

Title Type of contribution Date Place Persons involved
Brain Fair Zurich 2017 Performances, exhibitions (e.g. for education institutions) 17.03.2017 Zurich, Switzerland Kiselev Ilya; Liu Shih-Chii;


Communication with the public

Communication Title Media Place Year
Talks/events/exhibitions Brain Fair Zurich 2017 German-speaking Switzerland 2017

Associated projects

Number Title Start Funding scheme
172553 HEAR-EAR 01.04.2018 Project funding
126844 Early Auditory Based Recognition of Speech 01.03.2011 Project funding

Abstract

Interpreting acoustic scenes is an important engineering task, reaching high levels of sophistication in speech recognition applications such as recognizing what has been said, who said it or which language they spoke in. Current systems that solve these tasks typically require speech to be acquired by a close-mounted microphone from well-controlled acoustic environments and fail in realistically-changing environments with added noise, natural room reverberations, and competing talkers. By contrast, none of these realistic situations present significant difficulty to human speech communication. The representations chosen by biological systems appear to be tuned to the statistics of natural sounds, both reflecting the overall distribution of these sounds and adapting to their local statistics on a range of timescales. By contrast, the first stages of present engineering systems tend to be off-the-shelf signal processing algorithms, with no sensitivity to global or local sound statistics.Our goal is to build a new front-end processor for sounds that extracts in real-time features based on these biological principles in a way that is resistant to the presence of distractors and noise; to integrate this front-end with state-of-the-art speech-processing algorithms; to allow for adaptation of this front-end based on changes in input statistics; and to build a robust, real-time, compact hardware implementation of this sound processing system. We take advantage of recent developments in computational source streaming models, bio-inspired feature extraction methods which depend on the input statistics, advances in neuromorphic sensor technology, and embedded systems for real-time performance. These developments provide a unique opportunity to combine knowledge in these different areas to construct a novel acoustic processing system. We will validate this system on a task relating to aspects of human speech in different environments including other talkers and reverberance.This project intends to achieve two important objectives: first, it will advance the state of the art of machine speech recognition by using algorithms that determine acoustic features based on the statistics of the environments and the temporal coherence observed in the auditory cortex; and second, it will build a real-time source streaming system composed of a continuous-time sensor front end coupled to standard digital architectures such as microprocessors or FPGA boards, for easy reconfiguration of the system architecture. The operation of the system in real-time allows for exploration of biologically inspired acoustic algorithms that can also be used in embedded applications like hand-held devices and for use in setups for scientific behavioral experiments.
-