Back to overview

Stereo Reconstruction Using Top-down Cues. Computer Vision and Image Understanding

Type of publication Peer-reviewed
Publikationsform Original article (peer-reviewed)
Author Hadfield S, Lebeda K, Bowden R,
Project SMILE: Scalable Multimodal sign language Technology for sIgn language Learning and assessmEnt
Show all

Original article (peer-reviewed)

Journal Computer Vision and Image Understanding
Volume (Issue) 157
Page(s) 206 - 222
Title of proceedings Computer Vision and Image Understanding
DOI 10.1016/j.cviu.2016.08.001

Open Access


We present a framework which allows standard stereo reconstruction to be unified with a wide range of classic top-down cues from urban scene understanding. The resulting algorithm is analogous to the human visual system where conflicting interpretations of the scene due to ambiguous data can be resolved based on a higher level understanding of urban environments. The cues which are reformulated within the framework include: recognising common arrangements of surface normals and semantic edges (e.g. concave, convex and occlusion boundaries), recognising connected or coplanar structures such as walls, and recognising collinear edges (which are common on repetitive structures such as windows). Recognition of these common configurations has only recently become feasible, thanks to the emergence of large-scale reconstruction datasets. To demonstrate the importance and generality of scene understanding during stereo-reconstruction, the proposed approach is integrated with 3 different state-of-the-art techniques for bottom-up stereo reconstruction. The use of high-level cues is shown to improve performance by up to 15% on the Middlebury 2014 and KITTI datasets. We further evaluate the technique using the recently proposed HCI stereo metrics, finding significant improvements in the quality of depth discontinuities, planar surfaces and thin structures.