Project

Back to overview

Modeling Deformable 3-D Surfaces from Video

English title Modeling Deformable 3-D Surfaces from Video
Applicant Fua Pascal
Number 163461
Funding scheme Project funding (Div. I-III)
Research institution Laboratoire de vision par ordinateur EPFL - IC - ISIM - CVLAB
Institution of higher education EPF Lausanne - EPFL
Main discipline Information Technology
Start/End 01.09.2016 - 31.10.2018
Approved amount 118'872.00
Show all

Keywords (4)

3D Reconstruction ; Surface Modeling; Deformable Models ; Computer Vision

Lay Summary (French)

Lead
Pouvoir retrouver la forme 3D de surfaces déformables en n’utilisant qu’une seule caméra permettrait de le faire non seulement avec toutes celles qui équipent maintenant tous les téléphones portables et tablettes mobiles mais aussi dans des contextes plus spécialisés comme celui de la chirurgie endoscopique. Malheureusement c’est un problème difficile parce que beaucoup de formes 3D peuvent se projeter dans les images de manière très similaire.
Lay summary

Pour résoudre ce problème, nous avons développé des approches qui reposent sur l'établissement de correspondances entre dans une image d'entrée dans laquelle la forme 3D doit être retrouvée et une image de référence dans laquelle elle est connue. Nous obtenons nos résultats les plus précis en utilisant des correspondances denses, mais au prix d’avoir à minimiser une fonction non-convexe ce qui peut conduire à des erreurs.

Dans la suite de ce projet, nous exploiterons le puissance du « Deep Learning » pour surmonter cette difficulté et développer des algorithmes précis et robustes qui peuvent être réellement déployés.

Direct link to Lay Summary Last update: 03.05.2016

Responsible applicant and co-applicants

Employees

Name Institute

Publications

Publication
Learning to Reconstruct Texture-Less Deformable Surfaces from a Single View
Bednarik Jan, Fua Pascal, Salzmann Mathieu (2018), Learning to Reconstruct Texture-Less Deformable Surfaces from a Single View, in 2018 International Conference on 3D Vision (3DV), VeronaIEEE, Conference Proceedings.

Datasets

Texture-less Deformable Surfaces Dataset

Author Jan, Bednarik
Publication date 24.08.2018
Persistent Identifier (PID) N/A
Repository Texture-less Deformable Surfaces Dataset
Abstract
The dataset features deformable objects with uniform albedo captured under varying lighting conditions using depth camera Microsoft Kinect Xbox 360. The main challenge is to recover a 3D shape of a deforming surface observed in a single RGB image. Each sample thus contain an RGB image and corresponding ground truth (GT) normal map and depth map. A small subset of the dataset also contains GT triangulated meshes. Please refer to our publication for a detailed description of the dataset.

Collaboration

Group / person Country
Types of collaboration
Australian National University Australia (Oceania)
- in-depth/constructive exchanges on approaches, methods or results
- Exchange of personnel
T.U. Graz Austria (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Exchange of personnel

Associated projects

Number Title Start Funding scheme
153121 Modeling Deformable 3-D Surfaces from Video 01.04.2014 Project funding (Div. I-III)
172500 Modeling People and their Clothes in Crowded Scenes 01.01.2018 Project funding (Div. I-III)

Abstract

Being able to recover the 3D shape of deformable surfaces from ordinary images will make it possible to field reconstruction systems that only require single video cameras, such as those that now equip most mobile devices. It will also allow 3D shape recovery in more specialized contexts, such as when performing endoscopic surgery or using a fast camera to capture the deformations of a rapidly moving object. However, because many different 3D shapes can have virtually the same projection, such monocular shape recovery is inherently ambiguous. Arguably, these ambiguities could be resolved by using a depth-camera. However, even though they are now readily available, such depth-cameras are more difficult and expensive to fit into a cell-phone or an endoscope, and have limited range.The video-based solutions that have been proposed over the years mainly fall into two classes: Those that involve physics-inspired models and those that rely on a non-rigid structure-from-motion approach. The former solutions often entail designing complex objective functions and require hard-to-obtain knowledge about the precise material properties of the target surfaces. The latter depend on points being reliably tracked in image sequences and are only effective for relatively small deformations.To overcome these limitations, we have developed approaches that rely on establishing correspondences between an input image in which the 3D shape is to be recovered and a reference image in which the 3D shape is known. We showed that 3D shape recovery under those conditions could be formulated as an under-constrained linear problem. Furthermore, introducing inextensibility constraints and using control points to reduce the dimensionality of the problem turns it into a well-posed problem, which can be solved quickly using convex optimization.These have proved effective when the surface is well-textured but we have shown in the final stretch of the ongoing project that we can obtain even better results by replacing sparse correspondences by dense ones. Not only are the results more accurate but we can handle surfaces that are only partially textured in the presence of substantial occlusions and illumination changes. However, this improvement comes at a price: Instead of being able to model from individual frames, we have to track from frame-to-frame because the objective function is no longer convex, which makes our algorithm prone to tracking failures.In the continuation of this project, we will leverage the power of Deep Learning approaches to remove this limitation and preserve the accuracy of dense matching without requiring a priori knowledge of an initial pose. This will involve two main research directions:- Learn a mapping from image to rough 3D shape. We will train Convolutional Neural Networks to propose multiple 3D shape hypotheses for image patches and recover the global 3D shape by enforcing spatial consistency over neighboring patches.- Refine the initial estimate in spite of potential occlusions. Since the space of all possible deformations is immense, we do not expect the resulting shapes to be particularly accurate. We will therefore further develop our approach to refining them by better accounting for occlusions and differences on how informative different parts of the surface are. This will result in robust and accurate 3D surface reconstruction algorithms that can truly be deployed in real-world applications.
-