Small N Large P Inference on the Mean

Stamm, Aymerich
Small N Large P Inference on the Mean
Thursday 22nd October 2009
Secchi, P.
Advisor II:
Vantini, S.
Fuhrman, M.
Download link:
The goal of this tesi is to show how it is possible to make inference on the mean µ in the context of small N large p gaussian data, (ie) with a sample of N independent random vectors identically distributed as Np (µ, Σ), where Σ is a definite positive matrix and the dimension p of the vectors is much greater than the size N of the sample. This includes also functional data in which, typically, the dimension is infinite. A very important statistic will be defined in the context of small N large p data. It reduces to Hotelling’s T 2 statistic in the case p < N . Its asymptotic distribution will be given for p going to infinity and will include an unknown coefficient. It will thus be shown how this latter can be estimated and, in the end, tests, confidence region and confidence intervals for the mean vector will be presented. We will face more difficulties to deal with the functional case. A model to analyze functional data will be built within the Hilbert space L2 . A functional statistic to make inference on the mean function will be defined but its exact distribution will not be derived. It will only be caracterized given an equality in distribution with some other random variable. However, Bonferroni’s intervals will be defined in the functional context as well. Finally, a concrete example will be exposed to illustrate the results obtained for small N large p data. The data comes from the AneuRisk project which aims at studying, by means of a statistical approach, cerebral aneurysms. In particular, doctors think that the probabilities of onset and/or rupture of an aneurysm depend almost exclusively on the geometry of the artery. Thus, as an example, our statistic will be used to test if its expected radius is constant or not.