HUJI Computer Vision Research

HUJI Computer Vision Research - view synthesis

Panoramic Mosaic

The need to combine pictures into panoramic mosaics existed since the beginning of photography, as the camera's field of view is always smaller than the human field of view. Photo mosaicing, pasting together several pictures to create a panoramic mosaic, gives us a more complete view of the scene.
While scissors and glue are the tools used in film photography, more sophisticated methods were enabled with digital video. Digital mosaicing gives us three main advantages over the paper methods

While with paper we can only translate and rotate the images, digital processing enables us more general transformations such as affine projections.
Our cut and paste process can use combinations of overlapping images, therefore reducing noise in the final mosaic image.
In many cases there is a notable intensity difference between images, These are usually telling signs of mosaicing even if the images are aligned perfectly, we can overcome this using image blending.

While most mosaicing methods project onto a single image plane or onto a cylinder. Our method of manifold projection enables the creation of panoramic mosaics from video sequences under very general conditions, and in particular the unrestricted motion of a hand-held camera. The panoramic mosaic is a projection of the scene on to a virtual manifold whose structure depends on the camera's motion .
This manifold projection is defined for almost any arbitrary camera motion and scene structure. There are no distortions caused by the alignment to a reference frame, resolution of the mosaic being the same as the image resolution.

Panorama

Selected Papers

Shmuel Peleg and Joshua Herman, Panoramic Mosaics by Manifold Projection, CVPR, June 1997. For a commercial application see VideoBrush.

Benny Rousso, Shmuel Peleg, and Ilan Finci, Mosaicing with Generalized Strips, DARPA Image Understanding Workshop, May 1997.

Benny Rousso, Shmuel Peleg, Ilan Finci, and Alex Rav-Acha, Universal Mosaicing Using Pipe Projection, ICCV'98.

Image Enhancement
Suppose a movie camera caught a burglar in action, but because he moved so quickly all we have is a blurred picture that is not enough to recognize the burglar's identity. Using motion segmentation we can enhance the image of the burglar until it is sharp and clear. This is done by temporal integration. We register the images using the motion of the object we wish to enhance and take an average of the registered images. In this average our registered object will be enchanted with sharp edges. This enhancement is due to the fact that we use information over a sequence of images. This method works for transparent objects as well. For example a reflection on a window.

Transparent Window

Selected Papers

Michal Irani and Shmuel Peleg, Motion Analysis for Image Enhancement: Resolution, Occlusion,and Transparency, VCIR, Vol 4 No. 4, December 1993.

Photo Realistic Animation
In today's world there is constant need for more realistic textures for synthetic 3D worlds. This is important both for the computer simulations and the computer animations. Computer generated textures don't seem to be able to fool us, we want the real thing, real texture from a real image. We have developed a method to extract high quality texture from a sequence of images. Our algorithm has the following qualities:

The texture in the images can appear in different resolutions and with different perspective distortions.
We are not restricted to planner object and can work with any known 3D structure.
We have the ability to remove illumination artifacts such as highlights and reflections.
Storage of the resulting texture is in a multiresolution data structure
There are no restrictions on the computed texture.
The input of our algorithm is a sequence of images of an object and the 3D model of this object. We then create a multi resolution texture map for this object. Using our 3D information we know were each pixel in the image comes from and update our texture map correctly. The pixels in our texture map are a smoothed weighted average from all relevant images. The pixels that come from the images with higher resolution have a stronger weight giving our texture map an almost constant quality. This is similar to the constant quality achieved in mosaics. For each pixel in the texture map we only take in account information around the median of the brightness level at that pixel, this way we eliminate highlights and reflections that appear in only a few images. We use a laplacian pyramid to implement multiresolution and store our final texture map in this format.

Dancing video

A pictorial explanation

Selected Papers

E. Ofek, E. Shilat, A. Rappoport, and M. Werman Highlight and Reflection Independent Multiresolution Textures from Image Sequences HUJI TR, April 1995. Accepted to IEEE CG&A.

Novel Views
Virtual reality is highly developed in the world of computer graphics yet realistic virtual reality is still fairly new in computer vision. Our final goal is to interactively walk/fly through a 3D scene stored as a small set of images.
Given two reference images of some static 3D scene we can generate a third view from a new user-specified virtual camera. Our view synthesis is physically correct, meaning our result is the same image that would have resulted if we really put a camera and took a picture from that location. Our method derives an on-line warping function from a set of model images. This relies on algebraic constraints that all views of the same 3D scene must obey. The constraints we use are the trilinear tensor , a trilinear relation determined only by the configuration (location) of three cameras in space. Any 3D point across these three views satisfies these constraints, therefore given two views and a tensor, the point in the third view can be found simply by solving a linear equation.
Our method has the advantage that it avoids the computation of a 3D model or explicit camera geometry (relative location of the cameras). Both being compuatationaly heavy and numerically unstable. We need not assume any camera calibration or any 3D structure of the scene. All our algorithm needs as input is a dense correspondence between the two model views and a trilinear tensor. This tensor can be calculated with the help of a third view and only seven correspondening triplets across the three views. Our new camera location is driven by a simple equation that given a tensor and our new camera, specified as translation and rotation from the first, calculates the new tensor between the three cameras. The correctness of our new tensor can be mathematically proven based on algebraic properties of the tensor and the space of all legal tensors. Using this new tensor to tensor function as our driver we can create a movie of novel views.
Spinning Statue

Slide show presentation

Selected Papers

S. Avidan and A. Shashua. Unifying Two-View and Three-View Geometry . Submitted, Nov. 1996.

S. Avidan and A. Shashua. Novel View Synthesis in Tensor Space. Submitted, Nov. 1996

S. Avidan and A. Shashua. Tensorial Transfer: On the Representation of $N>3$ Views of a 3D Scene. In Proc. of the ARPA Image Understanding Workshop, Palm Springs, Feb. 1996.

A. Shashua and S. Avidan. The Rank4 Constraint In Multiple View Geometry. To appear in ECCV, April 1996.

Back to Research Page