Project
Back to overview
"Voice Theft": Chances and risks of digital voice technology
English title |
"Voice Theft": Chances and risks of digital voice technology |
Applicant |
Dellwo Volker
|
Number |
183152 |
Funding scheme |
Digital Lives
|
Research institution |
Institut für Computerlinguistik Universität Zürich
|
Institution of higher education |
University of Zurich - ZH |
Main discipline |
Other languages and literature |
Start/End |
01.12.2018 - 30.11.2020 |
Approved amount |
275'329.00 |
Show all
All Disciplines (2)
Other languages and literature |
Keywords (5)
voice synthesis; digital voice technology; automatic voice recognition; human voice processing; neural voice decoding
Lay Summary (German)
Lead
|
Prof. Dr. Volker Dellwo, Prof. Dr. Sascha Frühholz
|
Lay summary
|
Stimmen sind ein Teil unserer menschlichen Persönlichkeit und spielen eine wesentliche Rolle in der sozialen Interaktion zwischen Menschen und in zahlreichen industriellen und medizinischen Anwendungen. Menschen können anhand ihrer Stimme in Bezug auf ihre Identität erkannt werden und Stimmen enthalten wesentliche Informationen über emotionale Zustände und wichtige Persönlichkeitsmerkmale. Bis vor kurzem waren die sozialen Funktionen der Stimme eine unantastbare Eigenschaft und „Eigentum“ des Individuums, aber die digitale Revolution hat die Art und Weise, wie Sprach- und Stimminformationen verarbeitet werden können, grundlegend verändert. Einige Durchbrüche in der jüngsten Vergangenheit haben es beispielsweise ermöglicht, digitale Sprachaufnahmen mit dem Stimmprofil einer Person in nur wenigen Minuten zu erzeugen und zu kopieren. Solche Technologien bieten viele Chance (z.B. medizinische Bereiche), aber auch viele Risiken (z.B. Stimmverifizierung in Banksystemen). In diesem Projekt werden wir uns mit Schlüsselthemen der digitalen, kognitiven und neuronalen Wahrnehmung von manipulierten und synthetischen Stimmen bei Menschen beschäftigen. Aus der Perspektive der "Chancen" werden wir untersuchen, wie digitale Sprachmanipulation verwendet werden kann, um die digitale Sprach- und Stimmtechnologie zu verbessern (z.B. Stimmen vertrauenswürdiger machen). Aus Sicht der Risiken untersuchen wir das Betrugspotential manipulierter Stimmen bei Menschen und Maschinen (z. B. können Menschen und/oder Maschinen durch manipulierte Stimmen irregeführt werden). Die Ergebnisse unserer Forschung werden von grundlegender Bedeutung sein, um die Chancen und Risiken der Interaktion zwischen Mensch und Maschine sowie die Entwicklung sicherer digitaler Sprachtechnologien zu verstehen.
|
Responsible applicant and co-applicants
Employees
Collaboration
UK National Crime Agency |
Great Britain and Northern Ireland (Europe) |
|
- in-depth/constructive exchanges on approaches, methods or results |
JPFrench Associates |
Great Britain and Northern Ireland (Europe) |
|
- in-depth/constructive exchanges on approaches, methods or results - Publication - Research Infrastructure - Exchange of personnel |
Scientific events
Active participation
Title |
Type of contribution |
Title of article or contribution |
Date |
Place |
Persons involved |
29th Annual Conference of the International Association of Forensic Phonetics and Acoustics
|
Talk given at a conference
|
Does audio recording through video-conferencing tools hinder voice recognition performance? A comparison study on different audio channel recordings
|
29.08.2021
|
Marburg, Germany
|
Pellegrino Elisa; Dellwo Volker; Kathiresan Thayabaran;
|
Colloquium of the Germanic Society of Forensic Linguistics
|
Individual talk
|
The Dynamics of Indexical Information in Speech Communication
|
21.07.2021
|
Online, Germany
|
Dellwo Volker;
|
Communication with the public
Communication |
Title |
Media |
Place |
Year |
Talks/events/exhibitions
|
Scientifica 21
|
|
German-speaking Switzerland
|
2021
|
Media relations: radio, television
|
Deutschlandfunk Kultur: Interview mit Volker Dellwo zu Deepfakes
|
Deutschlandfunk Kultur
|
International
|
2020
|
Talks/events/exhibitions
|
Scientifica 2019
|
|
German-speaking Switzerland
|
2019
|
Associated projects
Number |
Title |
Start |
Funding scheme |
185399
|
The dynamics of indexical information in speech and its role in speech communication and speaker recognition |
01.12.2019 |
Project funding (Div. I-III) |
157409
|
Neurocognitive Mechanisms of Auditory Perception - Challenging The Human Auditory System at The Limits of Hearing |
01.09.2015 |
SNSF Professorships |
135287
|
Speaker Identification Based on Speech Temporal Information: A Forensic Phonetic Study of Speech Rhythm and Timing in Swiss Standard German |
01.09.2011 |
Project funding (Div. I-III) |
152905
|
I am not too old to hear you! The role of spectral and temporal information for understanding lateralization in speech perception across the life span |
01.10.2014 |
Project funding (Div. I-III) |
165544
|
Voice Identification in Infants |
01.12.2015 |
International short research visits |
159350
|
Acoustic Characteristics of Voice in Music and Straight Theatre, and Related Aspects of Production and Perception |
01.09.2015 |
Project funding (Div. I-III) |
183711
|
Neurocognitive Mechanisms of Auditory Perception - Challenging The Human Auditory System at The Limits of Hearing |
01.09.2019 |
SNSF Professorships |
Abstract
Voices are a part of our human personality and play an essential role in human social interaction and numerous industry and medical applications. Human-computer interaction becomes increasingly voice based and this has tremendous implications on the development on digital networks and infrastructures. Humans can be recognized by their voice and voices contain essential information about trust, emotional state and other personality characteristics. A loss of voice due to medical problems results in a large variety of critical social interaction disadvantages. Since recently, the social functions of voice were an inviolable property of the individual, but the digital revolution has categorically changed the way in which voice information can be processed. Some recent breakthrough developments, for example, made it possible to generate speech with the individuality of someone’s voice with only as little as one minute of training material (e.g. lhttp://lyrebird.ai). While such technology can be a chance to maintain the personality for speakers who permanently lost their voice and thus offers a variety of prospects for improving medical treatment and new economic enterprises, it also poses a tremendous threat on existing digital infrastructures in the industry, in particular on access systems, which verify their users by their voice (e.g.: www.swisscom.ch/en/about/legal- information/data-protection/voiceprint.html). Numerous industry sectors (e.g. retail banks) are in the process of implementing such systems at present. Forensic biometric analysis of voice, a field that is rapidly growing in Switzerland, now faces novel manipulation methods in which identities can be copied or disguised. Finally, the possibility to produce messages with the voice of an individual that the person has never uttered, facilitates the creation of vocal fake-news and has tremendous implications on media communication and digital social networking. In summary, contemporary voice processing technology offers novel chances and risks to digital infrastructures. This topic has recently been discussed widely in the media.In this project, we will address key issues on digital, cognitive- and neural-perceptual processing of manipulated voices on humans. From a ‘chances’ perspective we will study how digital voice manipulation can be used to enhance voice technology (e.g. make voices more trustworthy or control emotional information). From a ‘risks’ perspective we will study the fraud potential of manipulated voices in humans and machines (e.g. can humans and/or machines be mocked by manipulated voices). The results of our research will be fundamental in understanding the chances and risks in human-machine voice interaction and in the creation of safe digital voice technology. This can only be reached by a long-term inter-disciplinary research network consisting of speech & voice experts from different disciplines in science and industry (phonetics, psychology, medicine, engineering, forensics, etc.). The applicants, voice experts from Phonetics/Speech Sciences and Neuro-Psychology, with large national and international networks will lay the foundation for building an inter-disciplinary center for voice analysis with the present project. State-of-the-art acoustic-phonetic voice manipulation algorithms will be used to manipulate and understand the acoustic cues to personality in voice and - using behavioral and functional Magnetic Resonance Imaging (fMRI) techniques - the differences in human perception of natural and manipulated signals will be investigated. Our research will make a significant contribution in understanding the true chances, risks and possible threats of digital voice manipulations on industrial and social digital infrastructures in which voice identity is at stake and in understanding the trust that user have in such digital infrastructure. The project is critical to numerous industry sectors seeking to apply voice technology in the future for civil or forensic purposes. As such the project is fully in line with the central issues of the Digital Lives grant call and it has strong practical implications to the trust and ethics and digital economy and working life, two key areas in the National Research Program on Digitalization of the State Secretary for Education. The topic is also central for key strategic research decisions at UZH as part of the Communication Section in the Digital Society Initiative (www.dsi.uzh.ch) and it will play a central role in the inter-disciplinary creation of a Center for Voice Analysis at UZH.
-