Abstract: We study a prototypical problem in empirical Bayes. Namely,\n consider a population consisting of $k$ individuals each belonging to one of $k$ types (some types can be empty).\n Without any structural restrictions, it is impossible to learn the \n composition of the full population having observed only a small (random) subsample of size $m = o(k)$. \n Nevertheless, we show that in the sublinear regime of $m =\omega(k/\log k)$, it is possible to consistently estimate in total variation \n the \emph{profile} of the population, defined as the empirical\n distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove\n that in the linear regime of $m=c k$ for any constant $c$ the optimal rate is $\Theta(1/\log k)$. \n Our estimator is based on Wolfowitz's minimum distance method, which entails solving a linear program (LP) of size $k$.\n We show that there is a single infinite-dimensional LP whose value simultaneously characterizes the risk of the minimum distance estimator and certifies its minimax optimality.\n The sharp convergence rate is obtained by evaluating this LP using complex-analytic techniques.

Summary presentation

Full presentation