Projekt

Zurück zur Übersicht

Speaker Identification Based on Speech Temporal Information: A Forensic Phonetic Study of Speech Rhythm and Timing in Swiss Standard German

Gesuchsteller/in Dellwo Volker
Nummer 135287
Förderungsinstrument Projektförderung (Abt. I-III)
Forschungseinrichtung Phonetisches Laboratorium Universität Zürich
Hochschule Universität Zürich - ZH
Hauptdisziplin Schwerpunkt Germanistik und Anglistik
Beginn/Ende 01.09.2011 - 31.05.2015
Bewilligter Betrag 432'456.00
Alle Daten anzeigen

Alle Disziplinen (3)

Disziplin
Schwerpunkt Germanistik und Anglistik
Angewandte Linguistik
Psychologie

Keywords (6)

Phonetics, Prosody, Speaker Identification, Forensics, Timing, Rhythm

Lay Summary (Englisch)

Lead
Lay summary
Everyday experiences tell us that it is typically possible to identify a speaker solely on the basis of his/her voice (e.g. when someone starts a phone call with a simple 'hi' or when people talk in a different room). Such observations reveal that speakers carry individual features in their voices by which they can be identified to a considerable degree. The present project aims at studying the role of temporal characteristics of the speech signal in speaker identification. The study will pay particular attention to possible applications of the results in the field of forensic phonetics in which phonetic knowledge is applied in legal cases where the identity of the speaker in a recording is disputed.
We start from the observation that the acoustic speech signal is made up of dynamic processes resulting from the movements of the articulators. It has been demonstrated successfully in other scientific domains that humans can be identified on the basis of their movements only, e.g. by the way they walk. Our working hypothesis is that the movements of the organs of speech (e.g. jaw, lips or tongue) can be equally idiosyncratic as human gait and that idiosyncratic ways to move the organs of speech leave individual temporal charcateristics in the acoustic speech signal. We will therefore study numerous durational parameters in speech from segment durations (e.g. the durations of consonants and vowels) over syllable and word to prosodic durations (e.g. durational characteristics of intonation). In the first year of the project we are aiming at identifying temporal measures of speech that are most speaker-idiosyncratic. In the second year we will test these measures towards within speaker variability (e.g. different types of voice disguise). In year three we will use behavioral experimental methods to test whether the measures we have identified as being most speaker-idiosyncratic are perceptually salient (i.e. whether listeners can identify a speaker solely on the basis of certain temporal voice characteristics).
It is well possible that we will find that some temporal speaker idiosyncratic features are perceptually salient and others are not. We argue that the salient temporal features will help us explaining how human listeners identify speakers on the basis of their voice. Non-salient features, however, may be less prone to within speaker variability like voice disguise as they should be difficult to control for speakers. Such features may thus be of high value for acoustic voice identification of non-cooperative speakers (i.e. speakers not wishing to be identified) typically found under forensic circumstances.
Direktlink auf Lay Summary Letzte Aktualisierung: 21.02.2013

Verantw. Gesuchsteller/in und weitere Gesuchstellende

Mitarbeitende

Publikationen

Publikation
Listeners use temporal information to identify French- and English-accented speech
Kolly Marie-José, Boula de Mareüil Philippe, Leemann Adrian, Dellwo Volker (2017), Listeners use temporal information to identify French- and English-accented speech, in Speech Communication, 86, 121-134.
Rhythmic variability in Swiss German dialects
Leemann Adrian Volker Dellwo Marie-José Kolly Stephan Schmid (2012), Rhythmic variability in Swiss German dialects, in Proceedings of Speech Prosody, ShanghaiInternational Speech Communication Association, Shanghai.
Speaker idiosyncratic rhythmic features in the speech signal
Volker Dellwo Adrian Leemann Marie-José Kolly (2012), Speaker idiosyncratic rhythmic features in the speech signal, in Proceedings of Interspeech, PortlandInternational Speech Communication Association, Portland.
Variability of speech rhythm in synchronous speech
Dellwo Volker, Daniel Friedrichs (2012), Variability of speech rhythm in synchronous speech, in Proceedings of Speech Prosody, Shanghai/ChinaInternational Speech Communication Association, Shanghai.
Rhythmic characteristics of voice between and within languages.
Volker Dellwo Adrian Fourcin (2013), Rhythmic characteristics of voice between and within languages., in L’étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL), 59, 87-107.
Caratteritiche temporali del parlato Italiano e Tedesco: Un confronto tra parlanti nativi, bilingui e non-nativi
Schmid Stephan and Dellwo Volker (2014), Caratteritiche temporali del parlato Italiano e Tedesco: Un confronto tra parlanti nativi, bilingui e non-nativi, in Atti del VIII Convegno dell'Associazione Italiana Sienze della Voce, AISV, Napoli.
Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors.
Dellwo Volker, Leemann Adrian, Kolly Marie-José (2015), Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors., in The Journal of the Acoustical Society of America, 137(3), 1513-28.
Speaker-individuality in suprasegmental temporal features: Implications for forensic voice comparison.
Leemann Adrian, Kolly Marie-José, Dellwo Volker (2014), Speaker-individuality in suprasegmental temporal features: Implications for forensic voice comparison., in Forensic science international, 238, 59-67.
Verbrecherjagd mit gesprochener Sprache
Dellwo Volker, Hove Ingrid, Leemann Adrian, Kolly Marie José (2014), Verbrecherjagd mit gesprochener Sprache, in Kriminalistik, 68(2), 119-126.
The recognition of read and spontaneous speech in local vernacular: The case of Zurich German
Dellwo Volker, Leemann Adrian, Kolly Marie José (2014), The recognition of read and spontaneous speech in local vernacular: The case of Zurich German, in Journal of Phonetics, 48, 13-28.
The influence of speech rate on Fujisaki model parameters
Mixdorff Hansjörg, Leemann Adrian, Dellwo Volker (2013), The influence of speech rate on Fujisaki model parameters, in Eurasip Journal on Audio, Speech, and Music Processing, 2014(1), 1-11.
Speaker idiosyncratic variability of intensity across syllables
He Lei, Dellwo Volker (2013), Speaker idiosyncratic variability of intensity across syllables, in Proceedings of Interspeech, SingaporeInternational Speech Communication Association, Singapore.
Rhythmic variability between some Asian languages: Results from an automatic analysis of temporal characteristics
Dellwo Volker, Mok Peggy, Jenny Mathias (2013), Rhythmic variability between some Asian languages: Results from an automatic analysis of temporal characteristics, in Proceedings of Interspeech, SingaporeInternational Speech Communication Association, Singapore.
What does voice and silence tell us about speaker identity? An introduction to temporal speaker individualities and their use for forensic speaker comparison
Volker Dellwo (2015), What does voice and silence tell us about speaker identity? An introduction to temporal speaker individualities and their use for forensic speaker comparison, in Gina Maria Schneider Maria Chiara Janner Benedicte Elie (ed.), VOX & SILENTIUM Etudes de linguistique et litterature romanes Studi di linguistica e letteratura r, Peter Lang, Basel, 17-35.
Listeners may rely on intonation to distinguish languages of different rhythm classes
Lea Hagmann und Volker Dellwo (2014), Listeners may rely on intonation to distinguish languages of different rhythm classes, in Loquens, 1(1), 0-0.
Speaker-individuality in Fujisaki model f0 features: implications for forensic voice comparison
Leemann Adrian, Dellwo Volker, Mixdorff Hansjörg, O'Reilly Maria, Kolly Marie-José (2014), Speaker-individuality in Fujisaki model f0 features: implications for forensic voice comparison, in International Journal of Speech Language and the Law, 21(2), 343-370.
Foreign accent recognition based on temporal information contained in lowpass-filtered speech.
Kolly Marie-José, Leemann Adrian, Dellwo Volker (2014), Foreign accent recognition based on temporal information contained in lowpass-filtered speech., in Proceedings of Interspeech, SingaporeInternational Speech Communication Association, Singapore.
The recognition of read and spontaneous speech in local vernacular: The case of Zurich German
Dellwo Volker, Leemann Adrian, Kolly Marie-José (2015), The recognition of read and spontaneous speech in local vernacular: The case of Zurich German, in Journal of Phonetics, 48, 13-28.
Speaker-invariant suprasegmental temporal features in normal and disguised speech
Leemann Adrian, Kolly Marie-José (2015), Speaker-invariant suprasegmental temporal features in normal and disguised speech, in Speech Communication, 75, 97-122.
Rhythmische Variabilitaet bei synchronem Sprechen und ihre Bedeutung fuer die forensische Sprecheridentifizierung.
Friedrichs Daniel und Dellwo Volker (accepted), Rhythmische Variabilitaet bei synchronem Sprechen und ihre Bedeutung fuer die forensische Sprecheridentifizierung., in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL).
(How) do listeners perceive the origin of a foreign accent?
Kolly Marie-José and Dellwo Volker (accepted), (How) do listeners perceive the origin of a foreign accent?, in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL).
Sprachrhythmus bei bilingualen Sprechern.
Schmid Stephan and Dellwo Volker (accepted), Sprachrhythmus bei bilingualen Sprechern., in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL).
Audiovisuelle Sprechererkennung durch linguistisch naive Personen
Sutter Sibylle and Dellwo Volker (accepted), Audiovisuelle Sprechererkennung durch linguistisch naive Personen, in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL).
Acoustic correlates of speech rhythm: Are consonantal and vocalic intervals or syllables the more salient units
Dellwo Volker (accepted), Acoustic correlates of speech rhythm: Are consonantal and vocalic intervals or syllables the more salient units, in Ruben van de Vijver and Ralf Vogel (ed.), unknown, De Gruyter, unknown.

Zusammenarbeit

Gruppe / Person Land
Felder der Zusammenarbeit
Universidad Internacional Menéndez Pelayo Madrid Spanien (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Forschungsinfrastrukturen
- Austausch von Mitarbeitern
UniCamp - Universitaet Campinas Brasilien (Südamerika)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
University of Newcastle Grossbritannien und Nordirland (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Forschungsinfrastrukturen
Universitaet Tuebingen Deutschland (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
Techniche Universitaet Berlin Deutschland (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
Beuth-Hochschule für Technik Berlin Deutschland (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Publikation
University of Naples Italien (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Publikation
- Forschungsinfrastrukturen
- Austausch von Mitarbeitern
SPITCH AG Zurich Schweiz (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Forschungsinfrastrukturen
- Industrie/Wirtschaft/weitere anwendungs-orientierte Zusammenarbeit
Institut fuer Phonetik und Sprachsignalverarbeitung/ Universitaet Muenchen Deutschland (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Forschungsinfrastrukturen

Wissenschaftliche Veranstaltungen

Aktiver Beitrag

Titel Art des Beitrags Titel des Artikels oder Beitrages Datum Ort Beteiligte Personen
International Association for Forensic Phonetics and Acoustics (IAFPA) 2015 Vortrag im Rahmen einer Tagung Inter-speaker variability in intensity dynamics 08.07.2015 Leiden, Niederlande Rufener Katharina; Dellwo Volker
International Association for Forensic Phonetics and Acoustics (IAFPA) 2015 Vortrag im Rahmen einer Tagung A method for the elicitation of shouted speech with controlled loudness 08.07.2015 Leiden, Niederlande Dellwo Volker
Summer school on Speech Production and Perception Vortrag im Rahmen einer Tagung The influence of second language on speaker-idiosyncratic temporal patterns 30.09.2013 Aix-en-Provence, Frankreich Leemann Adrian; Dellwo Volker; Kolly Marie-José
International Conference of Forensic Phonetics and Acoustics (IAFPA) 2013 Vortrag im Rahmen einer Tagung Speaker discrimination using f0 and timing information 21.07.2013 Tampa, Vereinigte Staaten von Amerika Leemann Adrian; Dellwo Volker; Kolly Marie-José
International Conference of Forensic Phonetics and Acoustics (IAFPA) 2013 Vortrag im Rahmen einer Tagung Speaker-idiosyncratic temporal patterns in L2 speech 21.07.2013 Tampa, Vereinigte Staaten von Amerika Dellwo Volker; Kolly Marie-José; Leemann Adrian
International Conference of Forensic Phonetics and Acoustics (IAFPA) 2013 Vortrag im Rahmen einer Tagung Auditory speaker identification based on suprasegmental temporal characteristics 21.07.2013 Tampa, Vereinigte Staaten von Amerika Kolly Marie-José; Leemann Adrian; Dellwo Volker
Swiss Workshop on Prosody Vortrag im Rahmen einer Tagung Speaker-specific f0 patterns 15.03.2013 Neuchâtel, Schweiz Leemann Adrian; Kolly Marie-José; Dellwo Volker
Phonetik & Phonologie 2012 Vortrag im Rahmen einer Tagung Rhythmic differences between read and spontaneous speech: the case of %V 12.10.2012 Jena, Deutschland Leemann Adrian; Kolly Marie-José; Dellwo Volker
7. Tage der Schweizer Linguistik Vortrag im Rahmen einer Tagung Speaker identification based on temporal information: A forensic phonetic study of speech rhythm and timing in the Zurich variety of Swiss German 13.09.2012 Lugano , Schweiz Kolly Marie-José; Dellwo Volker; Leemann Adrian
Interspeech 2012 Poster Speaker idiosyncratic rhythmic features in the speech signal 09.09.2012 Portland, USA, Vereinigte Staaten von Amerika Leemann Adrian; Dellwo Volker; Kolly Marie-José
International Association for Forensic Phonetics and Acoustics (IAFPA) 2012 Vortrag im Rahmen einer Tagung The effect of articulatory obstruction on temporal characteristics of speech 06.08.2012 Santander, Spanien Dellwo Volker
International Association for Forensic Phonetics and Acoustics (IAFPA) 2012 Vortrag im Rahmen einer Tagung « Analysis of linguistic origin »: The identification of a foreign accent in L2-speech based on temporal characteristics 06.08.2012 Santander, Spanien Kolly Marie-José; Dellwo Volker
International Association for Forensic Phonetics and Acoustics (IAFPA) 2012 Vortrag im Rahmen einer Tagung Speaker identification based on speech temporal information: A forensic phonetic study of speech rhythm in the Zurich variety of Swiss German 06.08.2012 Santander, Spanien, Spanien Leemann Adrian; Kolly Marie-José; Dellwo Volker
International Association for Forensic Phonetics and Acoustics (IAFPA) 2012 Vortrag im Rahmen einer Tagung Can you see my voice? The influence of visual information on listeners' speaker identification ability 06.08.2012 Santander, Spanien Dellwo Volker
Perspectives on Rhythm and Timing (PoRT) 2012 Poster Dialectal typology based on speech rhythm 19.07.2012 University of Glasgow, Schottland, Grossbritannien und Nordirland Leemann Adrian; Kolly Marie-José; Dellwo Volker; Schmid Stephan
Perspectives on Rhythm and Timing (PoRT) 2012 Poster Speaker identification based on speech rhythm: the case of bilinguals 19.07.2012 Glasgow, Grossbritannien und Nordirland Leemann Adrian; Schmid Stephan; Kolly Marie-José; Dellwo Volker
Speech Prosody 2012 Vortrag im Rahmen einer Tagung Rhythmic variability in Swiss German dialects 22.05.2012 Shanghai, China Leemann Adrian; Schmid Stephan; Kolly Marie-José; Dellwo Volker
Speech Prosody 2012 Poster Variability of speech rhythm in synchronous speech 22.05.2012 Shanghai, PRC, China Dellwo Volker
1st Workshop on Research on Prosody in Switzerland Vortrag im Rahmen einer Tagung Exploring the potential of studying twins in speech rhythm research 27.04.2012 Universität Zürich, Schweiz Kolly Marie-José; Dellwo Volker; Leemann Adrian
New Observations in Speech and Hearing Vortrag im Rahmen einer Tagung Sprechererkennung mittels Sprachrhythmus 01.02.2012 Institute of Phonetics, University of Munich, Deutschland Dellwo Volker
Associazione Italiana di Scienze della Voce Vortrag im Rahmen einer Tagung Caratteritiche temporali del parlato Italiano e Tedesco: Un confronto tra parlanti nativi, bilingui e non-nativi 18.01.2012 Rome/Italy, Italien Dellwo Volker; Schmid Stephan
Phonetics Colloquium Vortrag im Rahmen einer Tagung Sprechererkennung mittels Sprachrhythmus 18.01.2012 Phonetics Department, University of Saarbruecken, Deutschland Dellwo Volker
Berner Linguisten Kolloquium - Wissenschaftliches Kolloquium Vortrag im Rahmen einer Tagung Rhythmische Klassifikation 8 Schweizerdeutscher Dialekte", Vortrag im BeLing 13.12.2011 Universität Bern, Schweiz Leemann Adrian


Veranstaltungen zum Wissenstransfer

Aktiver Beitrag

Titel Art des Beitrags Titel des Artikels oder Beitrages Datum Ort Beteiligte Personen
SCIENTIFICA 2013 Performances, Ausstellungen (z.B. für Bildungsinstitute) 31.08.2013 Zurich, Schweiz
SCIENTIFICA 2012 Performances, Ausstellungen (z.B. für Bildungsinstitute) 01.09.2012 Universität Zürich, Schweiz


Kommunikation mit der Öffentlichkeit

Kommunikation Titel Medien Ort Jahr
Medienarbeit: Radio, Fernsehen Das Auge hoert mit DRS 1 Deutschschweiz 2011
Medienarbeit: Radio, Fernsehen Taetersuche ueber Stimmerkennung DRS 1 Deutschschweiz 2011
Medienarbeit: Printmedien, Online-Medien Stimmanalyse hilft bei der Verbrecherjagd 20 Minuten Deutschschweiz 2012
Medienarbeit: Printmedien, Online-Medien Wie Zueritueuetsch aussieht Tages Anzeiger Deutschschweiz 2012

Verbundene Projekte

Nummer Titel Start Förderungsinstrument
158094 Trends in Phonetics and Phonology. Studies from German-speaking Europe 01.12.2014 Publikationsbeiträge
155024 “Judging by your accent, you must be French” – which phonetic information is necessary for foreign accent recognition? 01.12.2014 Doc.Mobility
148585 Word stress: dialectal variation and perception 01.06.2013 International Exploratory Workshops
165544 Voice Identification in Infants 01.12.2015 Internationale Kurzaufenthalte
152905 I am not too old to hear you! The role of spectral and temporal information for understanding lateralization in speech perception across the life span 01.10.2014 Projekte
143874 Zwischen Konversation und Urlaut – Studien zum musikalisierten Sprechen im Composed Theatre des 21. Jahrhunderts mittels musikwissenschaftlicher und phonetischer Methoden 01.12.2012 Projektförderung (Abt. I-III)
145654 Swiss VoiceApp - Your voice. Your identity. 01.02.2013 Agora

Abstract

Speakers' voices are to a high degree individual but we only have a limited understanding of this phenomenon. The major theoretical aim of the present project is to investigate how temporal characteristics of human speech (e.g. segmental or prosodic timing patterns, speech rhythmic characteristics and durational patterns of voicing) contribute to speaker individuality. On a practical level we will investigate how knowledge about temporal differences between speakers can be applied to forensic phonetic speaker identification. Speaker identification, in particular in forensic phonetic contexts, has predominantly been carried out on the basis of spectral characteristics of a speaker’s voice (e.g. fundamental frequency of vocal fold vibration and vocal tract resonances like vocalic formant frequencies). It has been argued correctly that such frequency content is directly influenced by idiosyncratic anatomical features of a speaker’s organs of speech (in particular the size of the larynx and lengths of the vocal tract cavities) which limit the range of certain spectral parameters and can thus contribute to making speakers’ voices individual. The emphasis is on ‘contribute’ as experience has taught us that there are clear limits in identifying speakers based on spectral parameters alone. It is therefore necessary to explore other dimensions in speech where idiosyncratic information is encoded. Such a dimension is ‘time’ and it has been paid surprisingly little attention to in the past. This is surprising because research from other domains, such as motion pattern recognition, has demonstrated convincingly that humans have highly individual ways in which they move and that individuals can be identified, for example, by means of temporal gait information alone. In the present project we argue that speech is similar to walking in that it is a highly complex brain operated control mechanism over a large number of muscle movements which may all be carried out to some degree in individual ways. We then go one step further and argue that such idiosyncratic motion does not need to be observed visually from articulator movement itself but can be found in the acoustic speech signal, as this is the immediate product of all speech articulatory movements. To study temporal individuality in speech we will therefore (a) systematically analyze the durational characteristics that vary most across speakers of the Standard German variety spoken in Zürich and explain the reasons for temporal variability between speakers. We will (b) test how robust such characteristics are towards sources of within-speaker variability (e.g. voice disguise or varying emotional content of speech) and between speaker similarity (e.g. speakers imitating each other). We will finally (c) test whether between-speaker temporal differences are perceptually salient. We argue that non-salient speaker idiosyncratic temporal characteristics are most valuable for acoustic forensic speaker identification as speakers should have limited control over manipulating these parameters for the case they are trying to hide their identity (i.e. in voice disguise).