On approximating the average distance between points
Webpage for a paper by Kfir Barhum, Oded Goldreich, and Adi Shraibman
Abstract
We consider the problem of approximating the average distance
between pairs of points in a high-dimensional Euclidean space,
and more generally in any metric space.
We consider two algorithmic approaches:
-
Referring only to Euclidean Spaces, we randomly reduce the
high-dimensional problem to a one-dimensional problem,
which can be solved in time that is almost-linear
in the number of points.
The resulting algorithm is somewhat better than a related
algorithm that can be obtained by using the known randomized
embedding of Euclidean Spaces into $\ell_1$-metric.
-
An alternative approach consists of selecting a random sample
of pairs of points and outputting the average distance between these pairs.
It turns out that, for any metric space, it suffices to use a sample
of size that is linear in the number of points.
Our analysis of this method is somewhat simpler and better
than the known analysis of Indyk (STOC, 1999).
We also study the existence of corresponding deterministic algorithms,
presenting both positive and negative results.
In particular, in the Euclidean case,
this approach outperforms the first approach.
In general, it seems that the second approach is superior
to the first approach.
Material available on-line
Abstract (of initial version [March 2007])
We consider the problem of approximating the average distance
between pairs of points in a high-dimensional Euclidean space,
and more generally in any metric space. Our aim is providing
linear-time approximation algorithms, which in particular beat
the obvious quadratic-time algorithm that computes the exact value.
We consider two algorithmic approaches:
-
Referring only to Euclidean Spaces, we randomly reduce the
high-dimensional problem to a one-dimensional problem.
The resulting algorithm runs in almost-linear time
and lends itself to a ``direct'' derandomization.
-
An alternative approach consists of selecting a random sample
of pairs of points and outputting the average distance between these pairs.
It turns out that, for any metric space, it suffices to use a sample
of size that is linear in the number of points.
A ``direct'' derandomization of this algorithm is meaningless,
but we study possible ``indirect'' derandomization that are
inspired by it.
Back to
either Oded Goldreich's homepage.
or general list of papers.