Small N Large P Inference on the Mean
Thursday 22nd October 2009
The goal of this tesi is to show how it is possible to make inference on the mean µ in the context of small N large p gaussian data, (ie) with a sample of N independent random vectors identically distributed as Np (µ, Σ), where Σ is a definite positive matrix and the dimension p of the vectors is much greater than the size N of the sample. This includes also functional data in which, typically, the dimension is inﬁnite. A very important statistic will be defined in the context of small N large p data. It reduces to Hotelling’s T 2 statistic in the case p < N . Its asymptotic distribution will be given for p going to inﬁnity and will include an unknown coefficient. It will thus be shown how this latter can be estimated and, in the end, tests, conﬁdence region and conﬁdence intervals for the mean vector will be presented. We will face more difficulties to deal with the functional case. A model to analyze functional data will be built within the Hilbert space L2 . A functional statistic to make inference on the mean function will be defined but its exact distribution will not be derived. It will only be caracterized given an equality in distribution with some other random variable. However, Bonferroni’s intervals will be defined in the functional context as well. Finally, a concrete example will be exposed to illustrate the results obtained for small N large p data. The data comes from the AneuRisk project which aims at studying, by means of a statistical approach, cerebral aneurysms. In particular, doctors think that the probabilities of onset and/or rupture of an aneurysm depend almost exclusively on the geometry of the artery. Thus, as an example, our statistic will be used to test if its expected radius is constant or not.