## High-Confidence Predictions under Adversarial Uncertainty

by Andrew Drucker

This paper considers two natural problems regarding on-line predictions of arbitrary Boolean sequences. In both problems we are presented with a stream of bits, and are required to make some prediction/guess regarding its continuation. (The following description may be a bit inaccurate.)

In the first problem it is best to think of the sequence as being infinite. Given parameter $\epsilon,\delta>0$, and an on-line access to the sequence stream. The task is to select (at random, based on past view) a location (whose value has not been revealed) such that its value is 0 with probability at least $1-\epsilon-\delta$. That is, at each step, the on-line algorithm decides (at random) whether to "gamble" on the next bit or (get its value and) move on. The algorithm is required to make a gamble on any sequence for which the (limit of the) density of 1's is below $\epsilon$.

The proposed algorithm partitions the sequence to increasingly longer intervals. In each interval, it selects a random location, and gambles on this location only if the fraction of 1's that preceded it (in this interval) does not exceed $\epsilon$, and also in that case the gamble is avoided with probability that is proportional to the said fraction multiplied by $1/\epsilon$.

I think it would be interesting to figure out what is the best algorithm that can achieve the aforementioned task, where performance is measured in term of time at which the gamble is made. That is, for each randomized algorithm $A$ and each infinite sequence $x$, denote by $g_A(x)$ a random variable describing that time. Drucker's result states that for every $x$ in which the (limit of the) density of 1's is below $\epsilon$, the expected value of $g(x)$ is finite. Can you propose an algorithm that significantly improves his bound on all sequences? Actually, it seems more appealing to analyze performance as a function of some structural properties of the sequence. For example, define $f(x)$ as the length of the longest prefix of $x$ in which the fraction of 1's exceeds $\epsilon$. Then, we wish to minimize $g_A(x)$ as a function of $f(x)$.

The second problem can be stated in terms for finite Boolean sequences. For parameters $\epsilon,\delta>0$, we fix $L=L(\epsilon,\delta)$, and consider a randomized algorithm given on-line access to an arbitrary Boolean sequence of length $L$. The algorithm should, at some point, output a pair $(N,p)$ of its choice such that with probability at least $1-\delta$ the fraction of 1's in the next $N$ bits is $(p\pm\epsilon) N$.

The proposed algorithm uses $L$ that is exponential in $1/\epsilon\delta$. It selects uniformly at random $i\in[\log_2 L]$ and $j\in[2^i]$, and makes a guess after seeing $(j-1+0.5)\cdot L/2^i$ bits. The guess refers to the next $0.5\cdot L/2^i$ bits (i.e., $N=L/2^{i+1}$), and uses the fraction of 1's in the last $N$ bits seen (i.e., $p$ is the fraction of 1's in the last $N$ bits).

I join Drucker in asking whether an alternative algorithm may have $L$ polynomial in $1/\epsilon\delta$.

#### The original abstract

Presented in ITCS 12; see paper.

Back to list of Oded's choices.