In this chapter, we describe algorithms for three-dimensional (3-D) vision that help robots accomplish navigation and grasping. To model cameras, we start with the basics of perspective projection and distortion due to lenses. This projection from a 3-D world to a two-dimensional (2-D) image can be inverted only by using information from the world or multiple 2-D views. If we know the 3-D model of an object or the location of 3-D landmarks, we can solve the pose estimation problem from one view. When two views are available, we can compute the 3-D motion and triangulate to reconstruct the world up to a scale factor. When multiple views are given either as sparse viewpoints or a continuous incoming video, then the robot path can be computer and point tracks can yield a sparse 3-D representation of the world. In order to grasp objects, we can estimate 3-D pose of the end effector or 3-D coordinates of the graspable points on the object.
DTAM: Dense tracking and mapping in real-time
Author Richard A. Newcombe, Steven J. Lovegrove, Andrew J. Davison
Video ID : 124
This video demonstrates the system described in the paper, "DTAM: Dense Tracking and Mapping in Real-Time" by Richard Newcombe, Steven Lovegrove and Andrew Davison for ICCV 2011.