Inferring 3D objects and the layout of indoor scenes from a single RGB-D image captured with a Kinect camera is a challenging task. Towards this goal, we propose a high-order graphical model and jointly reason about the layout, objects and superpixels in the image. In contrast to existing holistic approaches, our model leverages detailed 3D geometry using inverse graphics and explicitly enforces occlusion and visibility constraints for respecting scene properties and projective geometry. We cast the task as MAP inference in a factor graph and solve it efficiently using message passing. We evaluate our method with respect to several baselines on the challenging NYUv2 indoor dataset using 21 object categories. Our experiments demonstrate that the proposed method is able to infer scenes with a large degree of clutter and occlusions. The figure above shows from left-to-right: the objects inferred by our method, the superpixels (red=explained, transparent=unexplained), the rendered depth map (blue=close to red=far) and the inferred semantics color coded as described in the legend below the figure and projected into the image domain. Best Paper Award at GCPR 2015!
Results
Below, we show quantitative results of our method on the NYUv2 dataset, comparing our method to baselines as well as the method of Lin et al. [25].
The images below show qualitative results of our method on the NYUv2 dataset. Each subfigure shows (from left-to-right): the object wireframes, the rendered depth map (blue=close to red=far) and the induced semantic segmentation using the color coding specified above.
Video
The video below illustrates our method. Best viewed using YouTube's HD 720 setting.
Changelog
02.09.2015: First version online!
Download
The source code for this project has been tested on Ubuntu 14.04 and Matlab 2013b and is published under the GNU General Public License.
If you find this project useful, we would be happy if you cite us:
@inproceedings{Geiger2015GCPR,
author = {Andreas Geiger and Chaohui Wang},
title = {Joint 3D Object and Layout Inference from a single RGB-D Image}, booktitle = {German Conference on Pattern Recognition (GCPR)},
year = {2015}
}