Phonetics; Prosody; Speaker Identification; Forensics; Timing; Rhythm
(2017), Listeners use temporal information to identify French- and English-accented speech, in Speech Communication
, 86, 121-134.
(2015), Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors., in The Journal of the Acoustical Society of America
, 137(3), 1513-28.
(2015), Speaker-invariant suprasegmental temporal features in normal and disguised speech, in Speech Communication
, 75, 97-122.
(2015), The recognition of read and spontaneous speech in local vernacular: The case of Zurich German, in Journal of Phonetics
, 48, 13-28.
(2015), What does voice and silence tell us about speaker identity? An introduction to temporal speaker individualities and their use for forensic speaker comparison, 17-35.
(2014), Caratteritiche temporali del parlato Italiano e Tedesco: Un confronto tra parlanti nativi, bilingui e non-nativi, in Atti del VIII Convegno dell'Associazione Italiana Sienze della Voce
(2014), Foreign accent recognition based on temporal information contained in lowpass-filtered speech., in Proceedings of Interspeech
(2014), Listeners may rely on intonation to distinguish languages of different rhythm classes, in Loquens
, 1(1), 0-0.
(2014), Speaker-individuality in Fujisaki model f0 features: implications for forensic voice comparison, in International Journal of Speech Language and the Law
, 21(2), 343-370.
(2014), Speaker-individuality in suprasegmental temporal features: Implications for forensic voice comparison., in Forensic science international
, 238, 59-67.
(2014), The recognition of read and spontaneous speech in local vernacular: The case of Zurich German, in Journal of Phonetics
, 48, 13-28.
(2014), Verbrecherjagd mit gesprochener Sprache, in Kriminalistik
, 68(2), 119-126.
(2013), Rhythmic characteristics of voice between and within languages., in L’étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL)
, 59, 87-107.
(2013), Rhythmic variability between some Asian languages: Results from an automatic analysis of temporal characteristics, in Proceedings of Interspeech
(2013), Speaker idiosyncratic variability of intensity across syllables, in Proceedings of Interspeech
(2013), The influence of speech rate on Fujisaki model parameters, in Eurasip Journal on Audio, Speech, and Music Processing
, 2014(1), 1-11.
(2012), Rhythmic variability in Swiss German dialects, in Proceedings of Speech Prosody
(2012), Speaker idiosyncratic rhythmic features in the speech signal, in Proceedings of Interspeech
(2012), Variability of speech rhythm in synchronous speech, in Proceedings of Speech Prosody
, (How) do listeners perceive the origin of a foreign accent?, in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL)
, Acoustic correlates of speech rhythm: Are consonantal and vocalic intervals or syllables the more salient units.
, Audiovisuelle Sprechererkennung durch linguistisch naive Personen, in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL)
, Rhythmische Variabilitaet bei synchronem Sprechen und ihre Bedeutung fuer die forensische Sprecheridentifizierung., in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL)
, Sprachrhythmus bei bilingualen Sprechern., in L'étude de la prosodie en Suisse, Travaux neuchâtelois de linguistique (TRANEL)
Speakers' voices are to a high degree individual but we only have a limited understanding of this phenomenon. The major theoretical aim of the present project is to investigate how temporal characteristics of human speech (e.g. segmental or prosodic timing patterns, speech rhythmic characteristics and durational patterns of voicing) contribute to speaker individuality. On a practical level we will investigate how knowledge about temporal differences between speakers can be applied to forensic phonetic speaker identification. Speaker identification, in particular in forensic phonetic contexts, has predominantly been carried out on the basis of spectral characteristics of a speaker’s voice (e.g. fundamental frequency of vocal fold vibration and vocal tract resonances like vocalic formant frequencies). It has been argued correctly that such frequency content is directly influenced by idiosyncratic anatomical features of a speaker’s organs of speech (in particular the size of the larynx and lengths of the vocal tract cavities) which limit the range of certain spectral parameters and can thus contribute to making speakers’ voices individual. The emphasis is on ‘contribute’ as experience has taught us that there are clear limits in identifying speakers based on spectral parameters alone. It is therefore necessary to explore other dimensions in speech where idiosyncratic information is encoded. Such a dimension is ‘time’ and it has been paid surprisingly little attention to in the past. This is surprising because research from other domains, such as motion pattern recognition, has demonstrated convincingly that humans have highly individual ways in which they move and that individuals can be identified, for example, by means of temporal gait information alone. In the present project we argue that speech is similar to walking in that it is a highly complex brain operated control mechanism over a large number of muscle movements which may all be carried out to some degree in individual ways. We then go one step further and argue that such idiosyncratic motion does not need to be observed visually from articulator movement itself but can be found in the acoustic speech signal, as this is the immediate product of all speech articulatory movements. To study temporal individuality in speech we will therefore (a) systematically analyze the durational characteristics that vary most across speakers of the Standard German variety spoken in Zürich and explain the reasons for temporal variability between speakers. We will (b) test how robust such characteristics are towards sources of within-speaker variability (e.g. voice disguise or varying emotional content of speech) and between speaker similarity (e.g. speakers imitating each other). We will finally (c) test whether between-speaker temporal differences are perceptually salient. We argue that non-salient speaker idiosyncratic temporal characteristics are most valuable for acoustic forensic speaker identification as speakers should have limited control over manipulating these parameters for the case they are trying to hide their identity (i.e. in voice disguise).