Project
Back to overview
The dynamics of indexical information in speech and its role in speech communication and speaker recognition
English title |
The dynamics of indexical information in speech and its role in speech communication and speaker recognition |
Applicant |
Dellwo Volker
|
Number |
185399 |
Funding scheme |
Project funding (Div. I-III)
|
Research institution |
Phonetisches Laboratorium Institut für Computerlinguistik Universität Zürich
|
Institution of higher education |
University of Zurich - ZH |
Main discipline |
Other languages and literature |
Start/End |
01.12.2019 - 30.11.2023 |
Approved amount |
896'518.00 |
Show all
Keywords (6)
speaking styles; speech processing in computers; human voice processing; automatic speaker recognition ; speech processing in humans; indexical information in speech
Lay Summary (German)
Lead
|
Stimmen sind individuell und Menschen nutzen diese Individualität um einzelne Personen zu erkennen. Dies ist essenziell in der sozialen Interaktion von Menschen, um Individuen in einer Dialogsituation zu erkennen und zuzuordnen. Die Stimmerkennung findet zusätzlich Anwendung in der Praxis, z.B. bei Zugangssystemen, die Benutzer an ihrer Stimme erkennen oder in der Forensik zur Erkennung von Stimmen, die kriminell auffällig geworden sind.
|
Lay summary
|
In der Vergangenheit wurde die Stimmerkennung fast ausschliesslich auf der Seite des Rezipienten studiert, d.h. unter der Fragestellung, wie gut ist die Erkennungsleistung von menschlichen Hörern oder Maschinen unter bestimmten Umständen. Im vorliegenden Projekt wird untersucht, ob, und wenn ja, wie Sprecher ihre Stimme modifizieren, um sie besser oder schlechter erkennbar zu machen. In einem ersten Schritt bauen wir eine grosse Datenbank von ca. 500 Sprechern aus dem Grossraum Zürich auf in denen Sprecher unter verschiedenen situativen Umständen sprechen (z.B. sprechen mit Kindern, mit älteren Menschen oder in einer formalen Situation). Diese Datenbank gilt als Referenz. Im Anschluss an die Datenerhebung werden wir in unterschiedlichen Experimenten testen, wie Sprecher ihre Stimme verändern, wenn sie sich besser erkennbar machen wollen und wenn sie ihre stimmliche Identität verbergen wollen. Wir nehmen an, dass Sprecher sich an der Referenz orientieren, wenn sie ihre Stimmen anpassen, d.h. dass sie ihre stimmlichen Merkmale eher dem stimmlichen Durchschnitt anpassen, um nicht erkannt zu werden, bzw. vom Durchschnitt abweichen, um erkennbar zu werden. Die Ergebnisse sind sowohl von Bedeutung zum Verständnis des gesamten Prozesses der Stimmerkennung beim Menschen, aber auch für die Verbesserung der Erkennungsleistung von Computersystemen.
|
Responsible applicant and co-applicants
Employees
Project partner
Associated projects
Number |
Title |
Start |
Funding scheme |
159350
|
Acoustic Characteristics of Voice in Music and Straight Theatre, and Related Aspects of Production and Perception |
01.09.2015 |
Project funding (Div. I-III) |
183152
|
"Voice Theft": Chances and risks of digital voice technology |
01.12.2018 |
Digital Lives |
135287
|
Speaker Identification Based on Speech Temporal Information: A Forensic Phonetic Study of Speech Rhythm and Timing in Swiss Standard German |
01.09.2011 |
Project funding (Div. I-III) |
165544
|
Voice Identification in Infants |
01.12.2015 |
International short research visits |
Abstract
Humans have elaborate skills in recognizing speakers by their voice, a phenomenon that is deeply rooted in the evolution of human behavior. To date, the processes underlying voice recognition, how it is acquired, what role it plays in human communication and why evolution has equipped humans with voice identification skills are only poorly understood. Here, we argue that such knowledge is essential in understanding human speech communication and in improving applied areas where knowledge about human individuality cues is crucial, such as automatic recognition or forensic speaker comparison. Vocal cues to individuality (indexical information) have typically been viewed as static attributes of a speaker which are some by-product of the human articulation process. However, by now there is strong evidence that this view is incorrect. The dynamics of indexical information has been repeatedly pointed out in diverse fields of speech technology, forensic sciences and linguistics. In our own previous work we found that situational voice alterations (speaking styles) have an asymmetrical effect on speaker recognition. For example, learning a speaker under infant-directed speech (IDS) has advantage for recognition in adult directed conversational speech but not vice versa. It thus seems plausible that a speaking style like IDS evolved a specific combination of indexical cues as a technique for mothers to make themselves better recognizable to their offspring. To date it is unknown, however, how such recognition advantages may be accomplished. The major theoretical aim of the present project is to understand the dynamics of indexical information by examining speech from human interaction. For this we will investigate how indexical information varies (a) within utterances and (b) speaking styles and (c) what effects this variability has on speaker recognition. The results will show to what degree speakers have control over indexical information to either reveal or suppress their individuality in speech communication situations. To reach these aims we will sample a large homogenous population of human voices (~500 speakers) in which individuals will be recorded varying their speaking styles in situations where individuality plays a role (e.g. charismatic, deceptive, clear, or computer directed speech). Subsequently, this data will be analysed with computer modelling to understand the variability of indexical information between speaking styles in individuals in respect of the population mean. Using machine and human speaker recognition experiments we will test the effects that indexical variability has on the recognizability of speakers’ voices under certain speaking styles. Results will be crucial to the understanding of the role of indexical information for human-human and human-machine speech communication. They will affect the way we think about linguistic and speaker specific information processing alike and have an impact on the areas of psychology, behavioural biology and/or neurosciences that aim at understanding how humans perceive and process speech as well as speech technology and forensic phonetic applications where different samples most typically involve differences in speaking style. Finally, knowledge about the role of individuality information in speech will be important for understanding evolutionary processes of speech communication, a research area of increasing strength at UZH and across Switzerland. The project will also be a continuation of SNF Digital Lives seed money with which we study the impact of voice manipulation on human and computer speaker recognition (grant: 183152).
-