Research Interests |
I am broadly interested in theoretical computer science, with an emphasis on complexity theory, probabilistic proof systems, property testing, sublinear algorithms, and discrete harmonic analysis.
|
Publications |
Tom Gur, Ron D. Rothblum
Non-Interactive Proofs of Proximity
In preparation
Tom Gur, Ran Raz
Arthur-Merlin Streaming Complexity
The 40th International Colloquium on Automata, Languages and Programming (ICALP), 2013
[Abstract]
[BiBTeX]
[Paper: PDF]
[Slides: PDF]
We study the power of Arthur-Merlin probabilistic proof systems in
the data stream model. We show a canonical $\AM$ streaming algorithm
for a wide class of data stream problems. The algorithm offers a
tradeoff between the length of the proof and the space complexity
that is needed to verify it.
As an application, we give an $\AM$ streaming algorithm for the
\emph{Distinct Elements} problem. Given a data stream of length $m$
over alphabet of size $n$, the algorithm uses $\tilde O(s)$ space
and a proof of size $\tilde O(w)$, for every $s,w$ such that $s
\cdot w \ge n$ (where $\tilde O$ hides a $\polylog(m,n)$ factor). We
also prove a lower bound, showing that every $\MA$ streaming
algorithm for the \emph{Distinct Elements} problem that uses $s$
bits of space and a proof of size $w$, satisfies $s \cdot w =
\Omega(n)$.
As a part of the proof of the lower bound for the \emph{Distinct
Elements} problem, we show a new lower bound of $\Omega \left(
\sqrt n \right )$ on the $\MA$ communication complexity of the
\emph{Gap Hamming Distance} problem, and prove its tightness.
@article{DBLP:journals/eccc/GurR13,
author = {Tom Gur and
Ran Raz},
title = {Arthur-Merlin Streaming Complexity},
journal = {Electronic Colloquium on Computational Complexity (ECCC)},
volume = {20},
year = {2013},
pages = {20},
ee = {http://eccc.hpi-web.de/report/2013/020},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Tom Gur, Omer Tamuz
Testing Booleanity and the Uncertainty Principle
Electronic Colloquium on Computational Complexity (ECCC), 2012
[Abstract]
[BiBTeX]
[Paper: PDF]
[Slides: PDF]
Let $f:\{-1,1\}^n \to \R$ be a real function on the hypercube, given
by its discrete Fourier expansion, or, equivalently, represented as
a multilinear polynomial. We say that it is Boolean if its image is
in $\{-1,1\}$.
We show that every function on the hypercube with a sparse Fourier
expansion must either be Boolean or far from Boolean. In particular,
we show that a multilinear polynomial with at most $k$ terms must
either be Boolean, or output values different than $-1$ or $1$ for a
fraction of at least $2/(k+2)^2$ of its domain.
It follows that given oracle access to $f$, together with the
guarantee that its representation as a multilinear polynomial has at
most $k$ terms, one can test Booleanity using $O(k^2)$ queries. We
show an $\Omega(k)$ queries lower bound for this problem.
Our proof crucially uses Hirschman's entropic version of
Heisenberg's uncertainty principle.
@article{DBLP:journals/eccc/GurT12,
author = {Tom Gur and
Omer Tamuz},
title = {Testing Booleanity and the Uncertainty Principle},
journal = {Electronic Colloquium on Computational Complexity (ECCC)},
volume = {19},
year = {2012},
pages = {31},
ee = {http://eccc.hpi-web.de/report/2012/031},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
|
Other Papers |
Noah Zaitlen, Bogdan Pasaniuc, Tom Gur, Elad Ziv, Eran Halperin
Leveraging Genetic Variability across Populations for the Identification of Causal Variant
American Journal of Human Genetics (AJHG), 2010
[Abstract]
[BiBTeX]
[Paper: PDF]
Genome-wide association studies have been performed extensively in the last few years, resulting in many new discoveries of genomic regions that are associated with complex traits. It is often the case that a SNP found to be associated with the condition is not the causal SNP, but a proxy to it as a result of linkage disequilibrium. For the identification of the actual causal SNP, fine-mapping follow-up is performed, either with the use of dense genotyping or by sequencing of the region. In either case, if the causal SNP is in high linkage disequilibrium with other SNPs, the fine-mapping procedure will require a very large sample size for the identification of the causal SNP. Here, we show that by leveraging genetic variability across populations, we significantly increase the localization success rate (LSR) for a causal SNP in a follow-up study that involves multiple populations as compared to a study that involves only one population. Thus, the average power for detection of the causal variant will be higher in a joint analysis than that in studies in which only one population is analyzed at a time. On the basis of this observation, we developed a framework to efficiently search for a follow-up study design: our framework searches for the best combination of populations from a pool of available populations to maximize the LSR for detection of a causal variant. This framework and its accompanying software can be used to considerably enhance the power of fine-mapping studies.
@article{zaitlen2010lev,
title={Leveraging genetic variability across populations for the identification of causal variants.},
author={Zaitlen, N. and Pasaniuc, B. and Gur, T. and Ziv, E. and Halperin, E.},
journal={Am J Hum Genet},
volume={86},
number={1},
pages={23-33},
year={2010}
}
Bogdan Pasaniuc, Ram Avinery, Tom Gur, Christine F. Skibola, Paige M. Bracci, Eran Halperin
A generic coalescent-based framework for the selection of a reference panel for imputation
Genetic Epidemiology (GE), 2010
[Abstract]
[BiBTeX]
[Paper: PDF]
An important component in the analysis of genome-wide association studies involves the imputation of genotypes that have not been measured directly in the studied samples. The imputation procedure uses the linkage disequilibrium (LD) structure in the population to infer the genotype of an unobserved single nucleotide polymorphism. The LD structure is normally learned from a dense genotype map of a reference population that matches the studied population. In many instances there is no reference population that exactly matches the studied population, and a natural question arises as to how to choose the reference population for the imputation. Here we present a Coalescent-based method that addresses this issue. In contrast to the current paradigm of imputation methods, our method assigns a different reference dataset for each sample in the studied population, and for each region in the genome. This allows the flexibility to account for the diversity within populations, as well as across populations. Furthermore, because our approach treats each region in the genome separately, our method is suitable for the imputation of recently admixed populations. We evaluated our method across a large set of populations and found that our choice of reference data set considerably improves the accuracy of imputation, especially for regions with low LD and for populations without a reference population available as well as for admixed populations such as the Hispanic population. Our method is generic and can potentially be incorporated in any of the available imputation methods as an add-on.
@article{pasaniuc2010a g,
title={A generic coalescent-based framework for the selection of a reference panel for imputation.},
author={Pasaniuc, B. and Avinery, R. and Gur, T. and Skibola, C.F. and Bracci, P.M. and Halperin, E.},
journal={Genet Epidemiol},
volume={34},
number={8},
pages={773-82},
year={2010}
}
|
Background |
In 2012, I completed an MSc in theoretical
computer science and mathematics at the Weizmann Institute
of Science. My advisor was Ran
Raz.
In 2010, I completed a BSc in mathematics and computer
science at the Tel Aviv University.
|