Similarity by Composition*

Oren Boiman and Michal Irani
presented in NIPS 2006
* Patent Pending

This site presents the paper "Similarity by Composition" (NIPS 2006).

Paper (pdf) + Appendix (pdf)

Presentation (ppt)


We propose a new approach for measuring similarity between two signals, which is applicable to many machine learning tasks, and to many signal types. We say that a signal S1 is “similar” to a signal S2 if it is “easy” to compose S1 from few large contiguous chunks of S2. Obviously, if we use small enough pieces, then any signal can be composed of any other. Therefore, the larger those pieces are, the more similar S1 is to S2. This induces a local similarity score at every point in the signal, based on the size of its supported surrounding region. These local scores can in turn be accumulated in a principled information-theoretic way into a global similarity score of the entire S1 to S2. “Similarity by Composition” can be applied between pairs of signals, between groups of signals, and also between different portions of the same signal. It can therefore be employed in a wide variety of machine learning problems (clustering, classification, retrieval, segmentation, attention, saliency, labelling, etc.), and can be applied to a wide range of signal types (images, video, audio, biological data, etc.) We show a few such examples.

Basic Concept

Applications & Results

Detection of Saliency / Irregularities

Identify points with low LES scores as salient / irregular.

Detecting Saliency
in Images

(no reference)

Detecting Saliency
in Video

(no reference)


     For applications to detecting suspicious behaviors see 
     project web page.

Fabric Inspection

(no reference)

                        Input       Detected Defects (in red)

Wafer Inspection

(no reference)

                     Input       Output             Input          Output                    

Fruit Inspection

(a single 'good' reference)


Signal Segmentation

Pixels sharing a maximal region have evidence that they are part of the same pattern and they should be segmented together.

Using the LES scores of the maximal regions we build an affinity matrix of all point in the signal and use standard spectral clustering to segment the signal and extract meaningful patterns.

Segmentation of an image
 to meaningful patterns


Signal Classification

Action Database example sequences, taken from [Blank et al, ICCV 2005].
The complete database can be found here.

We have used the GES similarity score using a leave-one-out nearest neighbor classification.
97.5% of the classifications were correct.

Moreover, we were able to classify correctly much more complex queries, using the same database:

Signal Retrieval

We have used a database of a five-word sentence repeated 3 times by 31 speakers (overall 93 sequences).
We have use the GES similarity score using a leave-one-out nearest neighbor retrieval.
97% of the retrieved speakers we correct.