Project

Back to overview

Early Auditory Based Recognition of Speech

Applicant Liu Shih-Chii
Number 126844
Funding scheme Project funding
Research institution Institut für Neuroinformatik Universität Zürich Irchel und ETH Zürich
Institution of higher education University of Zurich - ZH
Main discipline Information Technology
Start/End 01.03.2011 - 31.03.2014
Approved amount 335'828.00
Show all

All Disciplines (2)

Discipline
Information Technology
Electrical Engineering

Keywords (8)

statistical acoustic features; biomimetic acoustic system; efficient coding; natural statistics; bio-inspired feature extraction; auditory features; real-time sound processing system; statistical auditory preprocessor

Lay Summary (English)

Lead
Lay summary
Interpreting acoustic scenes is an important engineering task, reaching high levels of sophistication in speech recognition applications such as recognizing what has been said, who said it or which language they spoke in.Current systems that solve these tasks typically require speech to be acquired by a close-mounted microphone from well-controlled acoustic environments and fail hopelessly in realistically-changing environments with added noise, natural room reverberations, and competing talkers. By contrast, none of these realistic situations present significant difficulty to human speech communication. The representations chosen by biological systems appear to be tuned to the statistics of natural sounds, both reflecting the overall distribution of these sounds and adapting to their local statistics on a range of timescales. By contrast, the first stages of present engineering systems tend to be off-the-shelf signal processing algorithms, with no sensitivity to global or local sound statistics. Our goal is to build a new statistical front-end processor for sounds that extracts features based on these biological principles in a way that is resistant to the presence of distractors and noise; to integrate this front-end with state-of-the-art speech-processing algorithms; to allow for adaptation of this front-end based on both changes in input statistics and fed-back information from higher-level processing; and to build a robust, real-time, low-power hardware implementation of this sound processing system. We take advantage of recent developments in bio-inspired feature extraction methods which depend on the input statistics, binaural source localization, advances in neuromorphic sensor technology, and embedded systems for real-time performance. These developments provide a unique opportunity to combine knowledge in these different areas to construct an entirely novel acoustic processing system. We will validate this system on a task relating to aspects of human speech in different environments including other talkers and reverberance.
Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants

Employees

Publications

Publication
Monaural source separation using a random forest classifier
Riday Cosimo, Bhargava Saurabh, Hahnloser Richard, Liu Shih-Chii (2016), Monaural source separation using a random forest classifier, in Proceedings of Interspeech 2016, San Francisco, CAInternational Speech Communication Association (ISCA) , USA.
Asynchronous binaural spatial audition sensor with 2x64x4 channel output
Liu Shih-Chii, van Schaik Andre, Minch Bradley, Delbruck Tobi (2014), Asynchronous binaural spatial audition sensor with 2x64x4 channel output, in IEEE Transactions on Biomedical Circuits and Systems, 1.
Real-time classification and sensor fusion with a spiking Deep Belief Network
O'Connor Peter, Neil Danny, Liu Shih-Chii, Delbruck Tobi, Pfeiffer Michael (2013), Real-time classification and sensor fusion with a spiking Deep Belief Network, in Frontiers of Neuromorphic Engineering, 1.
Investigating the neural representation of motor variability
Kollmorgen Sepp, Peleg Orit, Giret Nicolas, Hahnloser Richard (2012), Investigating the neural representation of motor variability, Society for Neuroscience, New Orleans.
Real-Time Speaker Identification using the AEREAR2 Event- Based Silicon Cochlea
Li Cheng-Han, Delbruck Tobi, Liu Shih-Chii (2012), Real-Time Speaker Identification using the AEREAR2 Event- Based Silicon Cochlea, in Proceedings of the 2012 IEEE Symposium on Circuits and Systems, IEEE International Symposium on Circuits and Systems, USA.

Collaboration

Group / person Country
Types of collaboration
University of Maryland United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
Silicon cochlea technology and hardware spiking deep belief networks Individual talk Telluride Neuromorphic Cognition Engineering Workshop 29.06.2015 Telluride, United States of America Liu Shih-Chii;
2013 Telluride Neuromorphic Engineering Cognition Workshop Individual talk Universal Neuromorphic Devices and Sensors for Real-Time Mobile Robotics 30.06.2013 Telluride, CO, USA, United States of America Liu Shih-Chii;


Knowledge transfer events

Active participation

Title Type of contribution Date Place Persons involved
BrainFair Zurich Performances, exhibitions (e.g. for education institutions) 16.03.2012 Zurich, Switzerland, Switzerland Bhargava Saurabh; Liu Shih-Chii;


Awards

Title Year
IEEE Circuits and Systems Distinguished Lecturer Program (2016-2017) 2016

Associated projects

Number Title Start Funding scheme
153565 Fast Separation of Auditory Sounds 01.04.2014 Project funding

Abstract

Interpreting acoustic scenes is an important engineering task, reaching high levels of sophistication in speech recognition applications such as recognizing what has been said, who said it or which language they spoke in. Current systems that solve these tasks typically require speech to be acquired by a close-mounted microphone from well-controlled acoustic environments and fail hopelessly in realistically-changing environments with added noise, natural room reverberations, and competing talkers. By contrast, none of these realistic situations present significant difficulty to human speech communication. The representations chosen by biological systems appear to be tuned to the statistics of natural sounds, both reflecting the overall distribution of these sounds and adapting to their local statistics on a range of timescales. By contrast, the first stages of present engineering systems tend to be off-the-shelf signal processing algorithms, with no sensitivity to global or local sound statistics.Our goal is to build a new statistical front-end processor for sounds that extracts features based on these biological principles in a way that is resistant to the presence of distractors and noise; to integrate this front-end with state-of-the-art speech-processing algorithms; to allow for adaptation of this front-end based on both changes in input statistics and fed-back information from higher-level processing; and to build a robust, real-time, low-power hardware implementation of this sound processing system. We take advantage of recent developments in bio-inspired feature extraction methods which depend on the input statistics, binaural source localization, advances in neuromorphic sensor technology, and embedded systems for real-time performance. These developments provide a unique opportunity to combine knowledge in these different areas to construct an entirely novel acoustic processing system. We will validate this system on a task relating to aspects of human speech in different environments including other talkers and reverberance.
-