Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks

Ilias Diakonikolas, Daniel M Kane, Vasilis Kontonis, Nikos Zarifis

[Proceedings link] [PDF]

Subject areas: PAC learning,

Presented in: Session 2B, Session 2D

[Zoom link for poster in Session 2B], [Zoom link for poster in Session 2D]

Abstract: We study the problem of PAC learning one-hidden-layer ReLU networks\nwith $k$ hidden units\non $\R^d$ under Gaussian marginals in the presence of additive label noise. \nFor the case of positive coefficients, we give the first polynomial-time algorithm \nfor this learning problem for $k$ up to $\tilde{\Omega}(\sqrt{\log d})$. \nPreviously, no polynomial time algorithm was known, even for $k=3$.\nThis answers an open question posed by~\cite{Kliv17}. Importantly,\nour algorithm does not require any assumptions about the rank of the weight matrix\nand its complexity is independent of its condition number. On the negative side,\nfor the more general task of PAC learning one-hidden-layer ReLU networks with positive or negative coefficients, \nwe prove a Statistical Query lower bound of $d^{\Omega(k)}$. Thus, we provide a \nseparation between the two classes in terms of efficient learnability.\nOur upper and lower bounds are general, extending to broader families of activation functions.

Summary presentation

Full presentation