Similarity by Composition*

Oren Boiman and Michal Irani
presented in NIPS 2006
* Patent Pending

This site presents the paper "Similarity by Composition" (NIPS 2006).

Paper (pdf) + Appendix (pdf)

Presentation (ppt)

Abstract

We propose a new approach for measuring similarity between two signals, which is applicable to many machine learning tasks, and to many signal types. We say that a signal S₁ is “similar” to a signal S₂ if it is “easy” to compose S₁ from few large contiguous chunks of S₂. Obviously, if we use small enough pieces, then any signal can be composed of any other. Therefore, the larger those pieces are, the more similar S₁ is to S₂. This induces a local similarity score at every point in the signal, based on the size of its supported surrounding region. These local scores can in turn be accumulated in a principled information-theoretic way into a global similarity score of the entire S₁ to S₂. “Similarity by Composition” can be applied between pairs of signals, between groups of signals, and also between different portions of the same signal. It can therefore be employed in a wide variety of machine learning problems (clustering, classification, retrieval, segmentation, attention, saliency, labelling, etc.), and can be applied to a wide range of signal types (images, video, audio, biological data, etc.) We show a few such examples.

Basic Concept

Applications & Results

Detection of Saliency / Irregularities

Identify points with low LES scores as salient / irregular.

Detecting Saliency in Images (no reference)
Detecting Saliency in Video (no reference)	For applications to detecting suspicious behaviors see project web page.
Fabric Inspection (no reference)	Input Detected Defects (in red)
Wafer Inspection (no reference)	Input Output Input Output
Fruit Inspection (a single 'good' reference)

Signal Segmentation

Pixels sharing a maximal region have evidence that they are part of the same pattern and they should be segmented together.

Using the LES scores of the maximal regions we build an affinity matrix of all point in the signal and use standard spectral clustering to segment the signal and extract meaningful patterns.

Segmentation of an image
to meaningful patterns

Signal Classification

Action Database example sequences, taken from [Blank et al, ICCV 2005].
The complete database can be found here.

We have used the GES similarity score using a leave-one-out nearest neighbor classification.
97.5% of the classifications were correct.

Moreover, we were able to classify correctly much more complex queries, using the same database:

Signal Retrieval

We have used a database of a five-word sentence repeated 3 times by 31 speakers (overall 93 sequences).
We have use the GES similarity score using a leave-one-out nearest neighbor retrieval.
97% of the retrieved speakers we correct.