Project

Back to overview

"Voice Theft": Chances and risks of digital voice technology

English title "Voice Theft": Chances and risks of digital voice technology
Applicant Dellwo Volker
Number 183152
Funding scheme Digital Lives
Research institution Institut für Computerlinguistik Universität Zürich
Institution of higher education University of Zurich - ZH
Main discipline Other languages and literature
Start/End 01.12.2018 - 30.11.2020
Approved amount 275'329.00
Show all

All Disciplines (2)

Discipline
Other languages and literature
Psychology

Keywords (5)

voice synthesis; digital voice technology; automatic voice recognition; human voice processing; neural voice decoding

Lay Summary (German)

Lead
Prof. Dr. Volker Dellwo, Prof. Dr. Sascha Frühholz
Lay summary

Stimmen sind ein Teil unserer menschlichen Persönlichkeit und spielen eine wesentliche Rolle in der sozialen Interaktion zwischen Menschen und in zahlreichen industriellen und medizinischen Anwendungen. Menschen können anhand ihrer Stimme in Bezug auf ihre Identität erkannt werden und Stimmen enthalten wesentliche Informationen über emotionale Zustände und wichtige Persönlichkeitsmerkmale.

Bis vor kurzem waren die sozialen Funktionen der Stimme eine unantastbare Eigenschaft und „Eigentum“ des Individuums, aber die digitale Revolution hat die Art und Weise, wie Sprach- und Stimminformationen verarbeitet werden können, grundlegend verändert. Einige Durchbrüche in der jüngsten Vergangenheit haben es beispielsweise ermöglicht, digitale Sprachaufnahmen mit dem Stimmprofil einer Person in nur wenigen Minuten zu erzeugen und zu kopieren. Solche Technologien bieten viele Chance (z.B. medizinische Bereiche), aber auch viele Risiken (z.B. Stimmverifizierung in Banksystemen).

In diesem Projekt werden wir uns mit Schlüsselthemen der digitalen, kognitiven und neuronalen Wahrnehmung von manipulierten und synthetischen Stimmen bei Menschen  beschäftigen. Aus der Perspektive der "Chancen" werden wir untersuchen, wie digitale Sprachmanipulation verwendet werden kann, um die digitale Sprach- und Stimmtechnologie zu verbessern (z.B. Stimmen vertrauenswürdiger machen). Aus Sicht der Risiken untersuchen wir das Betrugspotential manipulierter Stimmen bei Menschen und Maschinen (z. B. können Menschen und/oder Maschinen durch manipulierte Stimmen irregeführt werden).

Die Ergebnisse unserer Forschung werden von grundlegender Bedeutung sein, um die Chancen und Risiken der Interaktion zwischen Mensch und Maschine sowie die Entwicklung sicherer digitaler Sprachtechnologien zu verstehen.

Direct link to Lay Summary Last update: 12.11.2018

Responsible applicant and co-applicants

Employees

Associated projects

Number Title Start Funding scheme
185399 The dynamics of indexical information in speech and its role in speech communication and speaker recognition 01.12.2019 Project funding (Div. I-III)
157409 Neurocognitive Mechanisms of Auditory Perception - Challenging The Human Auditory System at The Limits of Hearing 01.09.2015 SNSF Professorships
135287 Speaker Identification Based on Speech Temporal Information: A Forensic Phonetic Study of Speech Rhythm and Timing in Swiss Standard German 01.09.2011 Project funding (Div. I-III)
152905 I am not too old to hear you! The role of spectral and temporal information for understanding lateralization in speech perception across the life span 01.10.2014 Project funding (Div. I-III)
165544 Voice Identification in Infants 01.12.2015 International short research visits
159350 Acoustic Characteristics of Voice in Music and Straight Theatre, and Related Aspects of Production and Perception 01.09.2015 Project funding (Div. I-III)
183711 Neurocognitive Mechanisms of Auditory Perception - Challenging The Human Auditory System at The Limits of Hearing 01.09.2019 SNSF Professorships

Abstract

Voices are a part of our human personality and play an essential role in human social interaction and numerous industry and medical applications. Human-computer interaction becomes increasingly voice based and this has tremendous implications on the development on digital networks and infrastructures. Humans can be recognized by their voice and voices contain essential information about trust, emotional state and other personality characteristics. A loss of voice due to medical problems results in a large variety of critical social interaction disadvantages. Since recently, the social functions of voice were an inviolable property of the individual, but the digital revolution has categorically changed the way in which voice information can be processed. Some recent breakthrough developments, for example, made it possible to generate speech with the individuality of someone’s voice with only as little as one minute of training material (e.g. lhttp://lyrebird.ai). While such technology can be a chance to maintain the personality for speakers who permanently lost their voice and thus offers a variety of prospects for improving medical treatment and new economic enterprises, it also poses a tremendous threat on existing digital infrastructures in the industry, in particular on access systems, which verify their users by their voice (e.g.: www.swisscom.ch/en/about/legal- information/data-protection/voiceprint.html). Numerous industry sectors (e.g. retail banks) are in the process of implementing such systems at present. Forensic biometric analysis of voice, a field that is rapidly growing in Switzerland, now faces novel manipulation methods in which identities can be copied or disguised. Finally, the possibility to produce messages with the voice of an individual that the person has never uttered, facilitates the creation of vocal fake-news and has tremendous implications on media communication and digital social networking. In summary, contemporary voice processing technology offers novel chances and risks to digital infrastructures. This topic has recently been discussed widely in the media.In this project, we will address key issues on digital, cognitive- and neural-perceptual processing of manipulated voices on humans. From a ‘chances’ perspective we will study how digital voice manipulation can be used to enhance voice technology (e.g. make voices more trustworthy or control emotional information). From a ‘risks’ perspective we will study the fraud potential of manipulated voices in humans and machines (e.g. can humans and/or machines be mocked by manipulated voices). The results of our research will be fundamental in understanding the chances and risks in human-machine voice interaction and in the creation of safe digital voice technology. This can only be reached by a long-term inter-disciplinary research network consisting of speech & voice experts from different disciplines in science and industry (phonetics, psychology, medicine, engineering, forensics, etc.). The applicants, voice experts from Phonetics/Speech Sciences and Neuro-Psychology, with large national and international networks will lay the foundation for building an inter-disciplinary center for voice analysis with the present project. State-of-the-art acoustic-phonetic voice manipulation algorithms will be used to manipulate and understand the acoustic cues to personality in voice and - using behavioral and functional Magnetic Resonance Imaging (fMRI) techniques - the differences in human perception of natural and manipulated signals will be investigated. Our research will make a significant contribution in understanding the true chances, risks and possible threats of digital voice manipulations on industrial and social digital infrastructures in which voice identity is at stake and in understanding the trust that user have in such digital infrastructure. The project is critical to numerous industry sectors seeking to apply voice technology in the future for civil or forensic purposes. As such the project is fully in line with the central issues of the Digital Lives grant call and it has strong practical implications to the trust and ethics and digital economy and working life, two key areas in the National Research Program on Digitalization of the State Secretary for Education. The topic is also central for key strategic research decisions at UZH as part of the Communication Section in the Digital Society Initiative (www.dsi.uzh.ch) and it will play a central role in the inter-disciplinary creation of a Center for Voice Analysis at UZH.
-