ReLU networks and polytopes

by Amir Yehudayof (based on joint works with Bakaev, Brunck, Hertrich and Stadew)

Oded's comments

Being far from the area, I was happy to be educated about its complexity theoretic aspects.

We consider neural networks (NN) with linear and relu gates mapping $\R^n$ to $\R$, where $\relu:\R\to\R$ is defined as $\relu(z) = \max(z,0)$ (and linear functions map $\R^+$ to $\R$). (One often considers affine rather linear gates, but the results hold)

C(n) denote class of n-var functions computable by sch NNs, and let D(f) denote the depth of a NN computing f; that is, D(n) equals the max value of D(f) taken over all functions $f$ in $C(n)$. Prior know results include

It was conjectured that the upper bound is tight in general. The new work shows that this conjecture is wrong. In fact, $D(n) \leq \log_3(n+1)$.

Interestingly, the new upper bound uses NN with decimal numbers/fractions, whereas it is known that if the NN uses decimal (rational) numbers, then $\log_3 n$ is a lower bound.

In contrast, it is known that any function in $C(n)$ can be approximated by a depth two NN.

Original abstract

ReLU networks are a standard example of neural networks. We will discuss the expressivity of neural networks, focusing on their depth. The plan is to start with a general introduction and a survey of basic results, including the duality between ReLU networks and a model for constructing polytopes. We will then discuss polytopes and discuss how their properties can help us understand the depth of neural networks.


Back to list of Oded's choices.