Date:20 August 2024, Tuesday
Location:S16-06-118, Seminar Room
Time:4pm, Singapore
Consider an unknown random vector X, taking values in R^d. Is it possible to “guess” its mean if the only information one is given consists of N independent copies of X? More accurately, given an arbitrary norm on R^d, the goal is to find a mean estimation procedure: upon receiving a wanted confidence parameter \delta and N independent copies X_1,…,X_N of an unknown random vector X – that has a finite mean and covariance -, the procedure returns \hat{\mu} for which the error \| \hat{\mu} – E X\| is as small as possible with probability at least 1-\delta (with respect to the product measure). This mean estimation problem has been studied extensively over the years and I will present some of the ideas that have led to its solution. Two rather surprising facts are that the obvious choice, setting \hat{\mu} to be the empirical mean N^{-1}\sum_{i=1}^N X_i is actually a terrible option for small confidence parameters \delta (most notably, when X is “heavy tailed”); and, what is even more surprising is that one can find an optimal procedure that performs as if the (arbitrary) random vector X were gaussian.
If time permits, I will describe some facets of a more general question: somewhat informally put, how much information on X can be derived from a typical sample X_1,…,X_N?