Projekt

Zurück zur Übersicht

See ColOr_II (Seeing Colours with an Orchestra II)

Titel Englisch See ColOr_II (Seeing Colours with an Orchestra II)
Gesuchsteller/in Bologna Guido
Nummer 140392
Förderungsinstrument Projektförderung (Abt. I-III)
Forschungseinrichtung Centre Universitaire d'Informatique Université de Genève
Hochschule Universität Genf - GE
Hauptdisziplin Informatik
Beginn/Ende 01.06.2012 - 31.05.2014
Bewilligter Betrag 119'760.00
Alle Daten anzeigen

Keywords (7)

computer vision; sensory substitution; multi-modal interaction; image processing; sound spatialisation; 3D camera; machine learning

Lay Summary (Englisch)

Lead
Lay summary

Following a 2002 survey, the World Health Organization estimated there were 161 million visually impaired people in the world, of whom 124 million had low vision and 37 million were blind. For a blind person, the quality of life is appreciably improved with the use of special devices, which facilitate precise tasks of everyday life, such as reading, manipulating objects or using a computer. 

See ColOr (Seeing Colors with an Orchestra), which is an ongoing project aiming at providing visually impaired individuals with a non-invasive mobility aid. See ColOr uses the auditory pathway to represent in real-time frontal image scenes. General targeted applications are the search for items of particular interest for blind users, the manipulation of objects and the navigation in an unknown environment. The See ColOr interface encodes coloured pixels by spatialised musical instrument sounds, in order to represent and emphasize the color and location of visual entities in their environment. The basic idea is to represent a pixel as a directional sound source with depth estimated by stereo-vision. Each emitted sound being assigned to a musical instrument, depending on the colour of the pixel. Finally, sound duration which corresponds to rhythm is related to pixel depth. 

Our system for visual substitution puts forward two novel aspects. First, by its architecture presenting local/global modules, and recognition modules See ColOr_II will imitate the visual system by providing the user with essential cues of vision (local, global and cognitive) by means of the auditory channel. Second, the simultaneous sonification of colour and depth is a characteristic that does not appear in any other electronic travel aid.

In the last two years we have been investigating user perception by means of a new prototype based on three distinct modules related to central vision, peripheral vision and obstacle detection. We are validating these modules with a number of experiments (several videos are available on http://www.youtube.com/guidobologna). 

The global module allows the user to explore a captured video frame. For instance, it is possible to compare two distant points (in terms of colour and depth) by pointing two fingers on a touchpad representing the current environment picture. To recognize an object in the current See ColOr prototype the user relies on colours, but in some situations colour is inadequate. Thus, in the next two years we propose to investigate how well machine learning models (such as neural networks) can learn to recognize classes of objects relevant to visually impaired individuals. 


Direktlink auf Lay Summary Letzte Aktualisierung: 21.02.2013

Verantw. Gesuchsteller/in und weitere Gesuchstellende

Mitarbeitende

Name Institut

Publikationen

Publikation
See ColOr: an extended sensory substitution device for the visually impaired.
(2014), See ColOr: an extended sensory substitution device for the visually impaired., in Emerald Journal of Assistive Technologies, 8(2), 77-94.
Efficient registering of color and range images
(2013), Efficient registering of color and range images, in EURASIP Journal on Image and Video Processing, 2013(1), 1-12.
Vision substitution experiments with See ColOr.
(2013), Vision substitution experiments with See ColOr., in Proc. the 5th work-conference on the interplay between natural and artificial computation, Mallorca, Spain.
Non-visual-cueing-based sensisng and understanding of nearby entities in aided navigation.
(2012), Non-visual-cueing-based sensisng and understanding of nearby entities in aided navigation., in Proc. of the 14th international ACM SIGACCESS Conference on Computers Accessibility, Boulder Colorado, US..

Wissenschaftliche Veranstaltungen

Aktiver Beitrag

Titel Art des Beitrags Titel des Artikels oder Beitrages Datum Ort Beteiligte Personen
International Seminar: "Le son de la lunière et de l'invisible" Vortrag im Rahmen einer Tagung Le projet See ColOr. 03.04.2014 Marseille, Frankreich Bologna Guido;
Symposium on "Sound is Information" Vortrag im Rahmen einer Tagung Experiments with the See ColOr interface 26.09.2013 Stockholm, Schweden Bologna Guido;


Auszeichnungen

Titel Jahr
Prix Latsis de l'Université de Genève attribué à Juan Diego Gomez pour les recherches effectuées pendant sa thèse de doctorat. 2015
Prix de la Fondation Dalle Molle pour les projets améliorant la qualité de la vie : "See Color : Seeing Colors with an Orchestra". 2014

Verbundene Projekte

Nummer Titel Start Förderungsinstrument
127334 See ColOr_II 01.06.2010 Projektförderung (Abt. I-III)

Abstract

In the research domain of vision substitution, exploration is the act of perceiving and understanding the various components of the surroundings, as well as the spatial distribution of the main constituents. Since June 2010 we have been investigating user perception by means of a prototype based on three distinct modules related to central vision, peripheral vision and obstacle detection. Each module provides the user with a sound representation of parts of video frames captured by a 3D camera. We have been validating two of these modules (the local and the global module, corresponding to central and peripheral vision, respectively) with a number of experiments. Videos on the local module are available on http://www.youtube.com/guidobologna. This proposal will extend the research that will come to an end with the current See ColOr_II project (See ColOr_II - Project 200021_127334, June 2010 - May 2012, with a PhD student). The local module provides the user with the auditory representation of a row containing 25 points of a captured image. The key idea is to represent a pixel of an image as a sound source located at a particular azimuth angle. Moreover, each emitted sound is assigned to a musical instrument, and to a sound duration depending on the colour and the depth of the pixel, respectively. The global module allows the user to explore a captured video frame. For instance, it is possible to compare two distant points (in terms of colour and depth) by pointing two fingers on a touchpad representing the current environment picture. The specific use of the local module, thus focusing only on a small portion of the captured scene leads to the tunnel vision phenomenon. A current research question is to determine whether the addition of a global vision mechanism provided by the global module allows the user to become aware of the surroundings in real time, with colours and 3D.To recognise an object in the current See ColOr prototype the user relies on colours, but in some situations colour is inadequate. For instance if a door has the same colour as the wall, it will be impossible to distinguish it. Furthermore, it could take too much time in some cases to identify a specific object. Thus, we propose to investigate how well machine learning models (such as neural networks, SVMs, decision trees, etc.) can learn to recognize classes of objects relevant to visually impaired individuals. In addition, we aim at making it possible to inspect panels such as bus stops. As a consequence, we propose to look into text from frames captured by a 3D camera. The difficulty compared to the state of the art in the domain is the resolution, and the angle of view that has consequences in character distortion. The research will be performed in text detection, segmentation and rectification, but not in the reading part itself (existing open source programs will be used). Recently, in computer vision a substantial progress has been achieved by extracting locally invariant features from images. Among several types, invariant features can be learnt by supervised learning models to detect objects from different view angles and at different scales. Image databases such as LabelMe (http://labelme.csail.mit.edu/), includes a great number of annotated object classes that will be used as a starting point to our research. Among other possible models of supervised learning, artificial neural networks will be very relevant to put into operation the recognition module. Specifically, neural networks have been successful in many recognition tasks; moreover, they can reach a decision very quickly, as they are parallel distributed models. Finally, other models such as SVMs or ensembles of decision stumps will be compared to neural networks on the predictive accuracy performance. Once an object is detected, we propose to represent it by a characteristic sound (“earcon”) that will be triggered from either the local module or the global module. Another novelty will be a guiding tool that will allow the user to grasp an object. The idea here is to sonify the x-y-z coordinates of an object by left-right spatialisation, pitch and sound duration, respectively. The same tool will be used in some situations to make it possible for the user to be in a better location to inspect a text panel. Here our research investigation will be at the experimental level, as we will focus on an experiment that will determine whether the guiding tool is accurate and efficient to reach and to grasp objects.Our system for visual substitution puts forward four novel aspects. First, by its architecture presenting local/global modules, and recognition modules See ColOr_II will imitate the visual system by providing the user with essential cues of vision (local, global and cognitive) by means of the auditory channel. Second, the simultaneous sonification of colour and depth is a characteristic that does not appear in any other electronic travel aid. Third, we propose to carry out new experiments such as exploring the 3D environment with its colours, finding specific objects and inspecting panels with relevant text information. Finally, the research proposed here will put forward cheap components and non-invasive devices, such as notebooks, Kinect 3D cameras, headphones (possibly bonephones [Wal05]). They could be adopted by the visually impaired community, once a 3D camera will be miniaturised on sunglasses.
-