Projekt

Zurück zur Übersicht

Speaker Identification Based on Speech Temporal Information: A Forensic Phonetic Study of Speech Rhythm and Timing in Swiss Standard German

Gesuchsteller/in Dellwo Volker
Nummer 135287
Förderungsinstrument Projektförderung (Abt. I-III)
Forschungseinrichtung Phonetisches Laboratorium Universität Zürich
Hochschule Universität Zürich - ZH
Hauptdisziplin Schwerpunkt Germanistik und Anglistik
Beginn/Ende 01.09.2011 - 31.05.2015
Bewilligter Betrag 432'456.00
Alle Daten anzeigen

Alle Disziplinen (3)

Disziplin
Schwerpunkt Germanistik und Anglistik
Angewandte Linguistik
Psychologie

Keywords (6)

Phonetics; Prosody; Speaker Identification; Forensics; Timing; Rhythm

Lay Summary (Englisch)

Lead
Lay summary
Everyday experiences tell us that it is typically possible to identify a speaker solely on the basis of his/her voice (e.g. when someone starts a phone call with a simple 'hi' or when people talk in a different room). Such observations reveal that speakers carry individual features in their voices by which they can be identified to a considerable degree. The present project aims at studying the role of temporal characteristics of the speech signal in speaker identification. The study will pay particular attention to possible applications of the results in the field of forensic phonetics in which phonetic knowledge is applied in legal cases where the identity of the speaker in a recording is disputed.
We start from the observation that the acoustic speech signal is made up of dynamic processes resulting from the movements of the articulators. It has been demonstrated successfully in other scientific domains that humans can be identified on the basis of their movements only, e.g. by the way they walk. Our working hypothesis is that the movements of the organs of speech (e.g. jaw, lips or tongue) can be equally idiosyncratic as human gait and that idiosyncratic ways to move the organs of speech leave individual temporal charcateristics in the acoustic speech signal. We will therefore study numerous durational parameters in speech from segment durations (e.g. the durations of consonants and vowels) over syllable and word to prosodic durations (e.g. durational characteristics of intonation). In the first year of the project we are aiming at identifying temporal measures of speech that are most speaker-idiosyncratic. In the second year we will test these measures towards within speaker variability (e.g. different types of voice disguise). In year three we will use behavioral experimental methods to test whether the measures we have identified as being most speaker-idiosyncratic are perceptually salient (i.e. whether listeners can identify a speaker solely on the basis of certain temporal voice characteristics).
It is well possible that we will find that some temporal speaker idiosyncratic features are perceptually salient and others are not. We argue that the salient temporal features will help us explaining how human listeners identify speakers on the basis of their voice. Non-salient features, however, may be less prone to within speaker variability like voice disguise as they should be difficult to control for speakers. Such features may thus be of high value for acoustic voice identification of non-cooperative speakers (i.e. speakers not wishing to be identified) typically found under forensic circumstances.
Direktlink auf Lay Summary Letzte Aktualisierung: 21.02.2013

Verantw. Gesuchsteller/in und weitere Gesuchstellende

Mitarbeitende

Publikationen

Publikation
Listeners use temporal information to identify French- and English-accented speech
(2017), Listeners use temporal information to identify French- and English-accented speech, in Speech Communication, 86, 121-134.
Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors.
(2015), Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors., in The Journal of the Acoustical Society of America, 137(3), 1513-28.
Speaker-invariant suprasegmental temporal features in normal and disguised speech
(2015), Speaker-invariant suprasegmental temporal features in normal and disguised speech, in Speech Communication, 75, 97-122.
The recognition of read and spontaneous speech in local vernacular: The case of Zurich German
(2015), The recognition of read and spontaneous speech in local vernacular: The case of Zurich German, in Journal of Phonetics, 48, 13-28.
What does voice and silence tell us about speaker identity? An introduction to temporal speaker individualities and their use for forensic speaker comparison
(2015), What does voice and silence tell us about speaker identity? An introduction to temporal speaker individualities and their use for forensic speaker comparison, 17-35.
Caratteritiche temporali del parlato Italiano e Tedesco: Un confronto tra parlanti nativi, bilingui e non-nativi
(2014), Caratteritiche temporali del parlato Italiano e Tedesco: Un confronto tra parlanti nativi, bilingui e non-nativi, in Atti del VIII Convegno dell'Associazione Italiana Sienze della Voce.
Foreign accent recognition based on temporal information contained in lowpass-filtered speech.
(2014), Foreign accent recognition based on temporal information contained in lowpass-filtered speech., in Proceedings of Interspeech, Singapore.
Listeners may rely on intonation to distinguish languages of different rhythm classes
(2014), Listeners may rely on intonation to distinguish languages of different rhythm classes, in Loquens, 1(1), 0-0.
Speaker-individuality in Fujisaki model f0 features: implications for forensic voice comparison
(2014), Speaker-individuality in Fujisaki model f0 features: implications for forensic voice comparison, in International Journal of Speech Language and the Law, 21(2), 343-370.
Speaker-individuality in suprasegmental temporal features: Implications for forensic voice comparison.
(2014), Speaker-individuality in suprasegmental temporal features: Implications for forensic voice comparison., in Forensic science international, 238, 59-67.
The recognition of read and spontaneous speech in local vernacular: The case of Zurich German
(2014), The recognition of read and spontaneous speech in local vernacular: The case of Zurich German, in Journal of Phonetics, 48, 13-28.
Verbrecherjagd mit gesprochener Sprache
(2014), Verbrecherjagd mit gesprochener Sprache, in Kriminalistik, 68(2), 119-126.
Rhythmic characteristics of voice between and within languages.
(2013), Rhythmic characteristics of voice between and within languages., in L’étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL), 59, 87-107.
Rhythmic variability between some Asian languages: Results from an automatic analysis of temporal characteristics
(2013), Rhythmic variability between some Asian languages: Results from an automatic analysis of temporal characteristics, in Proceedings of Interspeech, Singapore.
Speaker idiosyncratic variability of intensity across syllables
(2013), Speaker idiosyncratic variability of intensity across syllables, in Proceedings of Interspeech, Singapore.
The influence of speech rate on Fujisaki model parameters
(2013), The influence of speech rate on Fujisaki model parameters, in Eurasip Journal on Audio, Speech, and Music Processing, 2014(1), 1-11.
Rhythmic variability in Swiss German dialects
(2012), Rhythmic variability in Swiss German dialects, in Proceedings of Speech Prosody, Shanghai.
Speaker idiosyncratic rhythmic features in the speech signal
(2012), Speaker idiosyncratic rhythmic features in the speech signal, in Proceedings of Interspeech, Portland.
Variability of speech rhythm in synchronous speech
(2012), Variability of speech rhythm in synchronous speech, in Proceedings of Speech Prosody, Shanghai/China.
(How) do listeners perceive the origin of a foreign accent?
, (How) do listeners perceive the origin of a foreign accent?, in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL).
Acoustic correlates of speech rhythm: Are consonantal and vocalic intervals or syllables the more salient units
, Acoustic correlates of speech rhythm: Are consonantal and vocalic intervals or syllables the more salient units.
Audiovisuelle Sprechererkennung durch linguistisch naive Personen
, Audiovisuelle Sprechererkennung durch linguistisch naive Personen, in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL).
Rhythmische Variabilitaet bei synchronem Sprechen und ihre Bedeutung fuer die forensische Sprecheridentifizierung.
, Rhythmische Variabilitaet bei synchronem Sprechen und ihre Bedeutung fuer die forensische Sprecheridentifizierung., in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL).
Sprachrhythmus bei bilingualen Sprechern.
, Sprachrhythmus bei bilingualen Sprechern., in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL).

Zusammenarbeit

Gruppe / Person Land
Formen der Zusammenarbeit
Universidad Internacional Menéndez Pelayo Madrid Spanien (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Forschungsinfrastrukturen
- Austausch von Mitarbeitern
UniCamp - Universitaet Campinas Brasilien (Südamerika)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
University of Newcastle Grossbritannien und Nordirland (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Forschungsinfrastrukturen
Universitaet Tuebingen Deutschland (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
Techniche Universitaet Berlin Deutschland (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
Beuth-Hochschule für Technik Berlin Deutschland (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Publikation
University of Naples Italien (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Publikation
- Forschungsinfrastrukturen
- Austausch von Mitarbeitern
SPITCH AG Zurich Schweiz (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Forschungsinfrastrukturen
- Industrie/Wirtschaft/weitere anwendungs-orientierte Zusammenarbeit
Institut fuer Phonetik und Sprachsignalverarbeitung/ Universitaet Muenchen Deutschland (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Forschungsinfrastrukturen

Wissenschaftliche Veranstaltungen

Aktiver Beitrag

Titel Art des Beitrags Titel des Artikels oder Beitrages Datum Ort Beteiligte Personen
International Association for Forensic Phonetics and Acoustics (IAFPA) 2015 Vortrag im Rahmen einer Tagung A method for the elicitation of shouted speech with controlled loudness 08.07.2015 Leiden, Niederlande Dellwo Volker;
International Association for Forensic Phonetics and Acoustics (IAFPA) 2015 Vortrag im Rahmen einer Tagung Inter-speaker variability in intensity dynamics 08.07.2015 Leiden, Niederlande Rufener Katharina; Dellwo Volker;
Summer school on Speech Production and Perception Vortrag im Rahmen einer Tagung The influence of second language on speaker-idiosyncratic temporal patterns 30.09.2013 Aix-en-Provence, Frankreich Leemann Adrian; Dellwo Volker; Kolly Marie-José;
International Conference of Forensic Phonetics and Acoustics (IAFPA) 2013 Vortrag im Rahmen einer Tagung Auditory speaker identification based on suprasegmental temporal characteristics 21.07.2013 Tampa, Vereinigte Staaten von Amerika Kolly Marie-José; Leemann Adrian; Dellwo Volker;
International Conference of Forensic Phonetics and Acoustics (IAFPA) 2013 Vortrag im Rahmen einer Tagung Speaker discrimination using f0 and timing information 21.07.2013 Tampa, Vereinigte Staaten von Amerika Leemann Adrian; Dellwo Volker; Kolly Marie-José;
International Conference of Forensic Phonetics and Acoustics (IAFPA) 2013 Vortrag im Rahmen einer Tagung Speaker-idiosyncratic temporal patterns in L2 speech 21.07.2013 Tampa, Vereinigte Staaten von Amerika Dellwo Volker; Kolly Marie-José; Leemann Adrian;
Swiss Workshop on Prosody Vortrag im Rahmen einer Tagung Speaker-specific f0 patterns 15.03.2013 Neuchâtel, Schweiz Leemann Adrian; Kolly Marie-José; Dellwo Volker;
Phonetik & Phonologie 2012 Vortrag im Rahmen einer Tagung Rhythmic differences between read and spontaneous speech: the case of %V 12.10.2012 Jena, Deutschland Leemann Adrian; Kolly Marie-José; Dellwo Volker;
7. Tage der Schweizer Linguistik Vortrag im Rahmen einer Tagung Speaker identification based on temporal information: A forensic phonetic study of speech rhythm and timing in the Zurich variety of Swiss German 13.09.2012 Lugano , Schweiz Kolly Marie-José; Dellwo Volker; Leemann Adrian;
Interspeech 2012 Poster Speaker idiosyncratic rhythmic features in the speech signal 09.09.2012 Portland, USA, Vereinigte Staaten von Amerika Leemann Adrian; Dellwo Volker; Kolly Marie-José;
International Association for Forensic Phonetics and Acoustics (IAFPA) 2012 Vortrag im Rahmen einer Tagung « Analysis of linguistic origin »: The identification of a foreign accent in L2-speech based on temporal characteristics 06.08.2012 Santander, Spanien Kolly Marie-José; Dellwo Volker;
International Association for Forensic Phonetics and Acoustics (IAFPA) 2012 Vortrag im Rahmen einer Tagung The effect of articulatory obstruction on temporal characteristics of speech 06.08.2012 Santander, Spanien Dellwo Volker;
International Association for Forensic Phonetics and Acoustics (IAFPA) 2012 Vortrag im Rahmen einer Tagung Can you see my voice? The influence of visual information on listeners' speaker identification ability 06.08.2012 Santander, Spanien Dellwo Volker;
International Association for Forensic Phonetics and Acoustics (IAFPA) 2012 Vortrag im Rahmen einer Tagung Speaker identification based on speech temporal information: A forensic phonetic study of speech rhythm in the Zurich variety of Swiss German 06.08.2012 Santander, Spanien, Spanien Leemann Adrian; Kolly Marie-José; Dellwo Volker;
Perspectives on Rhythm and Timing (PoRT) 2012 Poster Speaker identification based on speech rhythm: the case of bilinguals 19.07.2012 Glasgow, Grossbritannien und Nordirland Leemann Adrian; Schmid Stephan; Kolly Marie-José; Dellwo Volker;
Perspectives on Rhythm and Timing (PoRT) 2012 Poster Dialectal typology based on speech rhythm 19.07.2012 University of Glasgow, Schottland, Grossbritannien und Nordirland Leemann Adrian; Kolly Marie-José; Dellwo Volker; Schmid Stephan;
Speech Prosody 2012 Poster Variability of speech rhythm in synchronous speech 22.05.2012 Shanghai, PRC, China Dellwo Volker;
Speech Prosody 2012 Vortrag im Rahmen einer Tagung Rhythmic variability in Swiss German dialects 22.05.2012 Shanghai, China Leemann Adrian; Schmid Stephan; Kolly Marie-José; Dellwo Volker;
1st Workshop on Research on Prosody in Switzerland Vortrag im Rahmen einer Tagung Exploring the potential of studying twins in speech rhythm research 27.04.2012 Universität Zürich, Schweiz Kolly Marie-José; Dellwo Volker; Leemann Adrian;
New Observations in Speech and Hearing Vortrag im Rahmen einer Tagung Sprechererkennung mittels Sprachrhythmus 01.02.2012 Institute of Phonetics, University of Munich, Deutschland Dellwo Volker;
Phonetics Colloquium Vortrag im Rahmen einer Tagung Sprechererkennung mittels Sprachrhythmus 18.01.2012 Phonetics Department, University of Saarbruecken, Deutschland Dellwo Volker;
Associazione Italiana di Scienze della Voce Vortrag im Rahmen einer Tagung Caratteritiche temporali del parlato Italiano e Tedesco: Un confronto tra parlanti nativi, bilingui e non-nativi 18.01.2012 Rome/Italy, Italien Dellwo Volker; Schmid Stephan;
Berner Linguisten Kolloquium - Wissenschaftliches Kolloquium Vortrag im Rahmen einer Tagung Rhythmische Klassifikation 8 Schweizerdeutscher Dialekte", Vortrag im BeLing 13.12.2011 Universität Bern, Schweiz Leemann Adrian;


Veranstaltungen zum Wissenstransfer

Aktiver Beitrag

Titel Art des Beitrags Titel des Artikels oder Beitrages Datum Ort Beteiligte Personen
SCIENTIFICA 2013 Performances, Ausstellungen (z.B. für Bildungsinstitute) 31.08.2013 Zurich, Schweiz
SCIENTIFICA 2012 Performances, Ausstellungen (z.B. für Bildungsinstitute) 01.09.2012 Universität Zürich, Schweiz


Kommunikation mit der Öffentlichkeit

Kommunikation Titel Medien Ort Jahr
Medienarbeit: Printmedien, Online-Medien Stimmanalyse hilft bei der Verbrecherjagd 20 Minuten Deutschschweiz 2012
Medienarbeit: Printmedien, Online-Medien Wie Zueritueuetsch aussieht Tages Anzeiger Deutschschweiz 2012
Medienarbeit: Radio, Fernsehen Das Auge hoert mit DRS 1 Deutschschweiz 2011
Medienarbeit: Radio, Fernsehen Taetersuche ueber Stimmerkennung DRS 1 Deutschschweiz 2011

Verbundene Projekte

Nummer Titel Start Förderungsinstrument
158094 Trends in Phonetics and Phonology. Studies from German-speaking Europe 01.12.2014 Publikationsbeiträge
148585 Word stress: dialectal variation and perception 01.06.2013 International Exploratory Workshops
165544 Voice Identification in Infants 01.12.2015 Resource not found: '228f486d-c393-4df7-b610-84193c83958f'
143874 Zwischen Konversation und Urlaut - Studien zum musikalisierten Sprechen im Composed Theatre des 21. Jahrhunderts mittels musikwissenschaftlicher und phonetischer Methoden 01.12.2012 Projektförderung (Abt. I-III)
155024 “Judging by your accent, you must be French” - which phonetic information is necessary for foreign accent recognition? 01.12.2014 Doc.Mobility
152905 I am not too old to hear you! The role of spectral and temporal information for understanding lateralization in speech perception across the life span 01.10.2014 Resource not found: 'ae7a8456-d388-4614-91fa-9226a7c1e277'
145654 Swiss VoiceApp - Your voice. Your identity. 01.02.2013 Agora

Abstract

Speakers' voices are to a high degree individual but we only have a limited understanding of this phenomenon. The major theoretical aim of the present project is to investigate how temporal characteristics of human speech (e.g. segmental or prosodic timing patterns, speech rhythmic characteristics and durational patterns of voicing) contribute to speaker individuality. On a practical level we will investigate how knowledge about temporal differences between speakers can be applied to forensic phonetic speaker identification. Speaker identification, in particular in forensic phonetic contexts, has predominantly been carried out on the basis of spectral characteristics of a speaker’s voice (e.g. fundamental frequency of vocal fold vibration and vocal tract resonances like vocalic formant frequencies). It has been argued correctly that such frequency content is directly influenced by idiosyncratic anatomical features of a speaker’s organs of speech (in particular the size of the larynx and lengths of the vocal tract cavities) which limit the range of certain spectral parameters and can thus contribute to making speakers’ voices individual. The emphasis is on ‘contribute’ as experience has taught us that there are clear limits in identifying speakers based on spectral parameters alone. It is therefore necessary to explore other dimensions in speech where idiosyncratic information is encoded. Such a dimension is ‘time’ and it has been paid surprisingly little attention to in the past. This is surprising because research from other domains, such as motion pattern recognition, has demonstrated convincingly that humans have highly individual ways in which they move and that individuals can be identified, for example, by means of temporal gait information alone. In the present project we argue that speech is similar to walking in that it is a highly complex brain operated control mechanism over a large number of muscle movements which may all be carried out to some degree in individual ways. We then go one step further and argue that such idiosyncratic motion does not need to be observed visually from articulator movement itself but can be found in the acoustic speech signal, as this is the immediate product of all speech articulatory movements. To study temporal individuality in speech we will therefore (a) systematically analyze the durational characteristics that vary most across speakers of the Standard German variety spoken in Zürich and explain the reasons for temporal variability between speakers. We will (b) test how robust such characteristics are towards sources of within-speaker variability (e.g. voice disguise or varying emotional content of speech) and between speaker similarity (e.g. speakers imitating each other). We will finally (c) test whether between-speaker temporal differences are perceptually salient. We argue that non-salient speaker idiosyncratic temporal characteristics are most valuable for acoustic forensic speaker identification as speakers should have limited control over manipulating these parameters for the case they are trying to hide their identity (i.e. in voice disguise).
-