Publications


Oded Goldreich, Tom Gur, and Ron D. Rothblum Proofs of Proximity for ContextFree Languages and ReadOnce Branching Programs
ICALP 2015
[Abstract]
[BiBTeX] [Paper:
PDF]
Proofs of proximity are probabilistic proof systems in which the verifier only queries a sublinear number of input bits, and soundness only means that, with high probability, the input is close to an accepting input. In their minimal form, called MerlinArthur proofs of proximity (MAP), the verifier receives, in addition to query access to the input, also free access to an explicitly given short (sublinear) proof. A more general notion is that of an interactive proof of proximity (IPP), in which the verifier is allowed to interact with an allpowerful, yet untrusted, prover. MAPs and IPPs may be thought of as the NP and IP analogues of property testing, respectively.
In this work we construct proofs of proximity for two natural classes of properties: contextfree languages, and languages accepted by small readonce branching programs. Our main results are:
(1) MAPs for these two classes, in which, for inputs of length n, both the verifier's query complexity and the length of the MAP proof are \tilde O(\sqrt(n)).
(2) IPPs for the same two classes with constant query complexity, polylogarithmic communication complexity, and logarithmically many rounds of interaction.
@article{DBLP:journals/eccc/GoldreichGR15,
author = {Oded Goldreich and Tom Gur and Ron D. Rothblum},
title = {Proofs of Proximity for ContextFree Languages and ReadOnce Branching Programs},
journal = {Electronic Colloquium on Computational Complexity (ECCC)},
volume = {22},
year = {2015},
pages = {24},
ee = {http://eccc.hpiweb.de/report/2015/024},
bibsource = {DBLP, http://dblp.unitrier.de}
}

Oded Goldreich, Tom Gur, and Ilan Komargodski Strong Locally Testable Codes with Relaxed Local Decoders
CCC 2015
[Abstract]
[BiBTeX] [Paper:
PDF]
[Slides:
PDF]
Locally testable codes (LTCs) are errorcorrecting codes
that admit very efficient codeword tests. An LTC is said to be
strong if it has a proximityoblivious tester; that
is, a tester that makes only a constant number of queries and
reject noncodewords with probability that depends solely on their
distance from the code.
Locally decodable codes (LDCs) are complimentary to
LTCs. While the latter allow for highly efficient rejection of
strings that are far from being codewords, LDCs allow for highly
efficient recovery of individual bits of the information that is
encoded in strings that are close to being codewords.
While there are known constructions of strongLTCs with nearlylinear
length, the existence of a constantquery LDC with
polynomial length is a major open problem. In an attempt to
bypass this barrier, BenSasson et al. (SICOMP 2006) introduced a
natural relaxation of local decodability, called relaxedLDCs. This
notion requires local recovery of nearly all individual
informationbits, yet allows for recoveryfailure (but not error) on
the rest. BenSasson et al. constructed a constantquery relaxedLDC with
nearlylinear length (i.e., length k^{1 + \alpha} for an arbitrarily
small constant alpha>0, where k is the dimension of the code).
This work focuses on obtaining strong testability and relaxed
decodability simultaneously. We construct a family of binary linear
codes of nearlylinear length that are both strongLTCs (with onesided
error) and constantquery relaxedLDCs. This improves upon the
previously known constructions, which obtain either weak LTCs or
require polynomial length.
Our construction heavily relies on tensor codes and
PCPs. In particular, we provide strong canonical PCPs
of proximity for membership in any linear code with constant
rate and relative distance. Loosely speaking, these are PCPs
of proximity wherein the verifier is proximity oblivious
(similarly to strongLTCs and every valid statement has a unique
canonical proof. Furthermore, the verifier is required to
reject noncanonical proofs (even for valid statements).
As an application, we improve the best known separation result
between the complexity of decision and verification in
the setting of property testing.
@article{DBLP:journals/eccc/GoldreichGK14,
author = {Oded Goldreich and Tom Gur and Ilan Komargodski},
title = {Strong Locally Testable Codes with Relaxed Local Decoders},
journal = {Electronic Colloquium on Computational Complexity (ECCC)},
volume = {21},
year = {2014},
pages = {25},
ee = {http://eccc.hpiweb.de/report/2014/025},
bibsource = {DBLP, http://dblp.unitrier.de}
}

Tom Gur and Ron D.
Rothblum NonInteractive Proofs of
Proximity
ITCS 2015
[Abstract]
[BiBTeX] [Paper:
PDF]
[Slides:
PDF]
We initiate a study of noninteractive proofs of
proximity. These proofsystems consist of a verifier
that wishes to ascertain the validity of a given
statement, using a short (sublinear length)
explicitly given proof, and a sublinear number of
queries to its input. Since the verifier cannot even
read the entire input, we only require it to reject
inputs that are far from being valid. Thus, the
verifier is only assured of the proximity of the
statement to a correct one. Such proofsystems can be
viewed as the NP (or more accurately MA) analogue of
property testing.
We explore both the power and limitations of
noninteractive proofs of proximity. We show that
such proofsystems can be exponentially stronger than
property testers, but are exponentially weaker than
the interactive proofs of proximity studied by
Rothblum, Vadhan and Wigderson (STOC 2013). In
addition, we show a natural problem that has a full
and (almost) tight multiplicative tradeoff between
the length of the proof and the verifier's query
complexity.
@inproceedings{DBLP:conf/innovations/GurR15,
author = {Tom Gur and Ron D. Rothblum},
editor = {Tim Roughgarden},
title = {NonInteractive Proofs of Proximity},
booktitle = {Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, {ITCS} 2015, Rehovot, Israel, January 1113, 2015},
pages = {133142},
publisher = {{ACM}},
year = {2015},
url = {http://doi.acm.org/10.1145/2688073.2688079},
doi = {10.1145/2688073.2688079},
timestamp = {Sun, 25 Jan 2015 11:31:05 +0100},
biburl = {http://dblp.unitrier.de/rec/bib/conf/innovations/GurR15},
bibsource = {dblp computer science bibliography, http://dblp.org}
}

We study the power of ArthurMerlin probabilistic
proof systems in the data stream model. We show a
canonical AM streaming algorithm for a wide class of
data stream problems. The algorithm offers a tradeoff
between the length of the proof and the space
complexity that is needed to verify it.
As an application, we give an AM streaming algorithm
for the Distinct Elements problem. Given a data
stream of length m over alphabet of size n, the
algorithm uses \tilde O(s) space and a proof of size
\tilde O(w), for every s,w such that s \cdot w \ge n
(where tilde O hides a \polylog(m,n) factor). We also
prove a lower bound, showing that every MA streaming
algorithm for the Distinct Elements problem that uses
s bits of space and a proof of size w, satisfies s
\cdot w = \Omega(n).
As a part of the proof of the lower bound for the
Distinct Elements problem, we show a new lower bound
of \Omega (\sqrt n) on the MA communication
complexity of the Gap Hamming Distance problem, and
prove its tightness.
@inproceedings{DBLP:conf/icalp/GurR13,
author = {Tom Gur and Ran Raz},
editor = {Fedor V. Fomin and
Rusins Freivalds and
Marta Z. Kwiatkowska and
David Peleg},
title = {ArthurMerlin Streaming Complexity},
booktitle = {Automata, Languages, and Programming  40th International Colloquium,
{ICALP} 2013, Riga, Latvia, July 812, 2013, Proceedings, Part {I}},
series = {Lecture Notes in Computer Science},
volume = {7965},
pages = {528539},
publisher = {Springer},
year = {2013},
url = {http://dx.doi.org/10.1007/9783642392061_45},
doi = {10.1007/9783642392061_45},
timestamp = {Mon, 08 Jul 2013 14:06:38 +0200},
biburl = {http://dblp.unitrier.de/rec/bib/conf/icalp/GurR13},
bibsource = {dblp computer science bibliography, http://dblp.org}
}

Let f:{1,1}^n \to \R be a real function on the
hypercube, given by its discrete Fourier expansion,
or, equivalently, represented as a multilinear
polynomial. We say that it is Boolean if its image is
in {1,1}. We show that every function on the
hypercube with a sparse Fourier expansion must either
be Boolean or far from Boolean. In particular, we
show that a multilinear polynomial with at most k
terms must either be Boolean, or output values
different than 1 or 1 for a fraction of at least
2/(k+2)^2 of its domain.
It follows that given oracle access to f, together
with the guarantee that its representation as a
multilinear polynomial has at most k terms, one can
test Booleanity using O(k^2) queries. We show an
\Omega(k) queries lower bound for this problem.
Our proof crucially uses Hirschman's entropic version
of Heisenberg's uncertainty principle.
@article{DBLP:journals/cjtcs/GurT13,
author = {Tom Gur and
Omer Tamuz},
title = {Testing Booleanity and the Uncertainty Principle},
journal = {Chicago J. Theor. Comput. Sci.},
volume = {2013},
year = {2013},
url = {http://cjtcs.cs.uchicago.edu/articles/2013/14/contents.html},
timestamp = {Mon, 24 Feb 2014 17:30:37 +0100},
biburl = {http://dblp.unitrier.de/rec/bib/journals/cjtcs/GurT13},
bibsource = {dblp computer science bibliography, http://dblp.org}
}

Genomewide association studies have been performed
extensively in the last few years, resulting in many
new discoveries of genomic regions that are
associated with complex traits. It is often the case
that a SNP found to be associated with the condition
is not the causal SNP, but a proxy to it as a result
of linkage disequilibrium. For the identification of
the actual causal SNP, finemapping followup is
performed, either with the use of dense genotyping or
by sequencing of the region. In either case, if the
causal SNP is in high linkage disequilibrium with
other SNPs, the finemapping procedure will require a
very large sample size for the identification of the
causal SNP. Here, we show that by leveraging genetic
variability across populations, we significantly
increase the localization success rate (LSR) for a
causal SNP in a followup study that involves
multiple populations as compared to a study that
involves only one population. Thus, the average power
for detection of the causal variant will be higher in
a joint analysis than that in studies in which only
one population is analyzed at a time. On the basis of
this observation, we developed a framework to
efficiently search for a followup study design: our
framework searches for the best combination of
populations from a pool of available populations to
maximize the LSR for detection of a causal variant.
This framework and its accompanying software can be
used to considerably enhance the power of
finemapping studies.
@article{zaitlen2010lev,
title={Leveraging genetic variability across
populations for the identification of causal
variants.},
author={Zaitlen, N. and Pasaniuc, B. and Gur, T. and
Ziv, E. and Halperin, E.},
journal={Am J Hum Genet},
volume={86},
number={1},
pages={2333},
year={2010}
}

Bogdan Pasaniuc, Ram
Avinery, Tom Gur, Christine F. Skibola, Paige M.
Bracci, and Eran
Halperin A Generic CoalescentBased
Framework for the Selection of a Reference Panel for
Imputation
Genetic Epidemiology, 34(8):773782, 2010
[Abstract]
[BiBTeX]
[Paper:
PDF]
An important component in the analysis of genomewide
association studies involves the imputation of
genotypes that have not been measured directly in the
studied samples. The imputation procedure uses the
linkage disequilibrium (LD) structure in the
population to infer the genotype of an unobserved
single nucleotide polymorphism. The LD structure is
normally learned from a dense genotype map of a
reference population that matches the studied
population. In many instances there is no reference
population that exactly matches the studied
population, and a natural question arises as to how
to choose the reference population for the
imputation. Here we present a Coalescentbased method
that addresses this issue. In contrast to the current
paradigm of imputation methods, our method assigns a
different reference dataset for each sample in the
studied population, and for each region in the
genome. This allows the flexibility to account for
the diversity within populations, as well as across
populations. Furthermore, because our approach treats
each region in the genome separately, our method is
suitable for the imputation of recently admixed
populations. We evaluated our method across a large
set of populations and found that our choice of
reference data set considerably improves the accuracy
of imputation, especially for regions with low LD and
for populations without a reference population
available as well as for admixed populations such as
the Hispanic population. Our method is generic and
can potentially be incorporated in any of the
available imputation methods as an addon.
@article{pasaniuc2010a g,
title={A generic coalescentbased framework for the
selection of a reference panel for
imputation.},
author={Pasaniuc, B. and Avinery, R. and Gur, T. and
Skibola, C.F. and Bracci, P.M. and Halperin,
E.},
journal={Genet Epidemiol},
volume={34},
number={8},
pages={77382},
year={2010}
}
