## On approximating the average distance between points

#### Abstract

We consider the problem of approximating the average distance between pairs of points in a high-dimensional Euclidean space, and more generally in any metric space. We consider two algorithmic approaches:
1. Referring only to Euclidean Spaces, we randomly reduce the high-dimensional problem to a one-dimensional problem, which can be solved in time that is almost-linear in the number of points. The resulting algorithm is somewhat better than a related algorithm that can be obtained by using the known randomized embedding of Euclidean Spaces into $\ell_1$-metric.
2. An alternative approach consists of selecting a random sample of pairs of points and outputting the average distance between these pairs. It turns out that, for any metric space, it suffices to use a sample of size that is linear in the number of points. Our analysis of this method is somewhat simpler and better than the known analysis of Indyk (STOC, 1999). We also study the existence of corresponding deterministic algorithms, presenting both positive and negative results. In particular, in the Euclidean case, this approach outperforms the first approach.
In general, it seems that the second approach is superior to the first approach.

#### Abstract (of initial version [March 2007])

We consider the problem of approximating the average distance between pairs of points in a high-dimensional Euclidean space, and more generally in any metric space. Our aim is providing linear-time approximation algorithms, which in particular beat the obvious quadratic-time algorithm that computes the exact value. We consider two algorithmic approaches:
1. Referring only to Euclidean Spaces, we randomly reduce the high-dimensional problem to a one-dimensional problem. The resulting algorithm runs in almost-linear time and lends itself to a direct'' derandomization.
2. An alternative approach consists of selecting a random sample of pairs of points and outputting the average distance between these pairs. It turns out that, for any metric space, it suffices to use a sample of size that is linear in the number of points. A direct'' derandomization of this algorithm is meaningless, but we study possible indirect'' derandomization that are inspired by it.

Back to either Oded Goldreich's homepage. or general list of papers.