The ability to recognize 3D objects from 2D images
is one that any young child possesses yet computers can not yet
perform well.
The main problem is that the appearance of a given object in images depends
on many unknown factors due to the imaging process such as
camera viewpoint and lighting.
Given an image we would like to select the object model that is
most similar to the observed image.
Given an object we would like to know the optimal way of representing it
for this recognition task.
We provide rigorous mathematical and statistical definitions for
image similarity and view likelihood. We also define geometric model based
invariants and constraints for a given
object under camera transformations.
Also based on these constraints we have developed a more efficient way of
indexing into the our object database.
Physicophysical experiment show that humans have some clear criteria to
decide when two images are similar or an image could originate from some
object. The framework in which we work in is as follows. We have a model
of our object and an image of some unknown object.
This image is recognized as our object if there exists a viewpoint from
which the model "reasonably" aligns with the image.
When are two images similar? When is an image and an object similar?
We replace the vague words "reasonably" and "similar"
with rigorous mathematical measures that assess the similarity between
objects and images under different assumptions.
For example, between an object and an image we
define an transformation metric that penalizes the non-rigidity of
the optimal affine deformation of the object that best aligns model to
the image.
When comparing against a database of objects many interpretations are
plausible. We developed a general framework to deal with this ambiguity
based on maximum likelihood. We define view likelihood, the probability
that a certain view of a given object is observed and view stability, how
little the image changes as the viewpoint is moved.
We plug in our metric for image similarity into an algorithm that evaluates
these new robust measures for recognition.
Finally we can use use this likelihood framework to
increase the robustness of our object recognition systems.
In order to accomplish the theory of view stability and likelihood, and
detect the canonical views of an object (which are its most stable and
likely views), a similarity measure between images is desired. We define
few similarity measures for silhouettes of curved objects, where shape is
the only available information in the image (color and texture clues are
missing).
Recently, a new similarity measure has been defined, based on partial curve
matching. We show how this measure can be used in order to learn representative
views from shape examples.
Selected Papers
Ronen Basri and Daphna Weinshall.
Distance Metric between 3D Models and 2D Images for Recognition
and Classification
T-PAMI 18(4) 1996
Michael Werman and Daphna Weinshall.
Similarity and Affine Invariant Distance Between Point Sets
T-PAMI 17(8) August 1995.
Daphna Weinshall and Michael Werman.
On View Likelihood and Stability.
To appear in T-PAMI.
Daphna Weinshall, Michael Werman
A computational theory of canonical views
ARPA Image Understanding Workshop, February 1996.
Yoram Gdalyahu, Daphna Weinshall
Measures for Silhouettes Resemblance and Representative Silhouettes of Curved Objects
ECCV-96, Cambridge, April 1996
In order to study the problem of shape both for purposes of recognition and
for purposes of reconstruction we study the relation between a 3D shape and
its 2D projections. Our goal is to find constraints between model points
and image measurements that are independent of the imaging parameters.
These relations are termed model-based invariants. We search for these
relationships and their optimal representations for purposes of efficient
recognition/ reconstruction algorithms.
Given a sequence of images of a set of points in 3D using unknown cameras,
there are two fundamental questions that need be solved:
What is the structure of the set of points in 3D?
What are the positions of the cameras relative to the points?
For a projective camera we show that these problems are dual. The imaging
of a set of points in space by multiple cameras can be captured by constraint
equations involving: space points, camera centers and image points were the
space point and camera centers are symmetrical to one another.
This formalism in which points and projections are interchangeable,
allow both seemingly different problems to be solved with the same algorithms.
The dual algebraic formalization for the case of camera centers are the
fundamental matrix,
trilinear tensor and quadlinear tensor.
We can use this approach for algorithms that reconstruct shape
and algorithms that learn invariant relations for
indexing into an object database.
Selected Papers
D. Weinshall, M. Werman, and A. Shashua
Shape Tensors for Efficient and Learnable Indexing
IEEE Workshop on Visual Scene Representation, Boston, June 1995.
Daphna Weinshall
Model-based invariants for 3-D vision
IJCV 10(1),1993
Stefan Carlsson and Daphna Weinshall.
Dual Computation of Projective Shape and Camera Positions
from Multiple Images
HU TR 96-6, 1996.
A real world application is more often not to decide if we have an image of
a specific object, but give an image recognize the object relating to it
from a large data base. Our approach is to use as an index some
shape invariant that can be calculated
from image measurements. Because of the ambiguity formed when projecting
3D onto 2D the is no unique shape invariant function so that each object
corresponds to a single value. However we can find invariant functions so
that the set of all points corresponding to feasible image of the object
is a manifold. This manifold, given the invariant function used, is unique
to each object and a image corresponding to this object must be on the
manifold.
We study different possible invariant functions such as the
shape tensor for this method. And study
different efficient methods to find and represent these manifolds. So that
given an image measurement we can know which are the possible object
it represents by searching which manifolds does this point exist on.
Selected Papers
D. Weinshall, M. Werman, and A. Shashua
Shape Tensors for Efficient and Learnable Indexing
IEEE Workshop on Visual Scene Representation, Boston, June 1995.
Michael Werman, Daphna Weinshall
Complexity of Indexing: Efficient and Learnable Large Database Indexing
ECCV-96, Cambridge, April 1996