Our eyes see movies and much of our perception is based on motion.
Motion can give computer vision many cues that can be used to retrieve
much of the scene information. These cues can be used to perform such tasks
as:
segmentation ,
3D reconstruction ,
compression ...
Motion observed in an image sequence is usually ego-motion, the motion of
the camera. Knowledge of the camera rotation and translation between frames
is crucial input to many vision applications. Still the search goes on for
reliable methods giving precise results.
Our approach is to calculate first the rotation and then the translation.
By using multiple iterations between rotation and translation we achieve
more robustness and thus improve our results further.
Our methods either use weak assumptions on the 3D structure
(existence of a planer surface in the image) or need three images
as input. We assume small rotations, this is a reasonable assumption
since in most image sequences the rotation is relatively small.
An earlier method for calculating ego-motion between two images
uses Plane + Parallax .
The method is based on detecting a single planer surface in the scene
directly from image intensities using temporal integration of registered
images. Next the parametric 2D image motion is computed for this plane.
We use this 2D motion to register the images giving a displacement field
affected only by camera translation.
The 3D camera translation is computed by finding the focus-of-expansion
(the intersection of all displacement vectors) in the registered frames.
Finally calculation of ego-motion is completed by computing the camera
rotation using a set of linear equations.
By using homography matrices (a linear transformation that registers 3D plane
between 2 images) we sophisticated our approach and get an even more
robust method that makes no 3D structure assumptions.
This method is based on a theoretical result that homography matrices
between two images form a linear space of rank four.
Assuming small rotations and given three homography matrices between the
images we can compute the rotational component of the camera motion.
The rotational component is a linear combination of these homographies.
Given three views (instead of two) we calculate the
Trilinear Tensor
which directly provides three ready homography matrices.
By using the three images we gain numerical redundancy
giving more numerical stability. Further more the homography matrices
calculated by this method use correspondences from the whole image scene,
not just in the plane associated with the homography.
Selected Papers
Michal Irani, Benny Rousso, Shmuel Peleg
Robust Recovery of Ego-Motion
CAIP 1993, Budapest, 13-15 Sept 1993.
Benny Rousso, Shai Avidan, Amnon Shashua, Shmuel Peleg
Robust recovery of camera rotation from three frames
ARPA Image Understanding Workshop, February 1996.
Many video systems receive movie input from an unstable source.
Consider a camera mounted on the back of a jeep driving on a bumpy dirt road.
Consider a camera held by a shaking hand. In image stabilization our input is
a jittering jumping sequence of images.
Our output is a smooth sequence of images displaying the same motion.
In most cases it is correct to assume that the translation of the camera is
the intended motion (the movement of the jeep for example). While the camera
rotation is the result of the camera jittering along some rotational axis
and is the cause for de-stabilization. By calculating the camera
rotation we can register the images
eliminating the rotational component between images. The result is a
sequence of images that are stabilized.
Selected Papers
Michal Irani, Benny Rousso, Shmuel Peleg
Recovery of Ego-Motion Using Image Stabilization
CVPR-94, Seattle, June 1994
B. Rousso, S. Avidan, A. Shashua and S. Peleg.
Robust Recovery of Camera Rotation from Three Frames.
CVPR, June 1996.
Almost any scene is composed a few distinct objects and a background.
Any little kid looking at this scene easily knows how to segment it,
yet this is one of the hardest problems in computer vision.
The segmentation of an image is important for
object recognition and image analysis.
When our input is a movie and we have differently moving objects in the
scene, we can use their
optical flow
to identify and track the moving objects.
Using the differences in the motion fields we can
separate the moving object from the rest of the image.
In return we can use this segmentation to improve our
motion analysis .
Our method for detecting and tracking multiple moving objects assumes both a
large spatial and a large temporal region. Because of the large amount of
information we do not need to assume temporal motion constancy and
can handle regions with more than one motion. Our algorithm
can handle the difficult situation of transparent and occluding objects.
The idea is detect one object at a time. The object that constitutes the
dominant motion in the scene is detected by temporal integration
of images registered with this motion. We do this using iterations
refining our results with each iteration. Using the dominant motion
to refine the pixels in our object, the pixels moving with those motion
parameters. Then further refining the motion parameters using only
the pixels in our object for the calculation.
Once the object has been detected we can exclude it and move on to the next
object.
We improve our segmentation iterating the segmentation of the region and
the calculation of its
motion parameters until our
iterations converge.
For this segmentation we have developed a method that allows us to relax the
need for accurate motion models. Given some motion parameters of an object we
register the image using those parameters. We expect pixels belonging to our
object to register correctly while pixels not belonging to our object
to register incorrectly. How good a pixel registers is its prediction error.
When trying to decide if a pixel belongs to the object we consider
the convergence of the prediction error. We consider the size of a
optical flow vectors between the first image and the registered image as a
value to set a threshold to.
Selected Papers
Michal Irani, Benny Rousso, Shmuel Peleg
Computing Occluding and Transparent Motions
IJCV January 1994
Moshe Ben-Ezra, Shmuel Peleg, Benny Rousso
Motion Segmentation Using Convergence Properties
APRA Image Understanding Workshop, November 1994.
Michal Irani, Shmuel Peleg
Motion Analysis for Image Enhancement: Resolution,
Occlusion,and Transparency
VCIR, Vol 4 No. 4, December 1993.