Functional Sparse K-Means Clustering

Floriello, Davide
Functional Sparse K-Means Clustering
Wednesday 20th July 2011
Secchi, P.
Advisor II:
Download link:
When one faces a clustering problem, typically unsupervised, it is probable that only a limited number of variables causes the differences between the groups. For this reason new statistical methods, denominated sparse , are born, which, at the same time, select relevant features and classify the data. The purpose of the following work is the extension, in a functional environment, of a result, recently appeared in [23], which defines a method of this type in case of vectorial K-means. If the data are functions, we propose a method able to select Borel subsets of the domain, where the clusters distinguish the most and able to classify through a functional K-means. This is obtained thanks to the constrained maximization of a functional and the optimization is to be done over the set of possible clusters and over a set of admissible functions, responsible for feature selection. It is proven the existence and uniqueness of the solution to this problem and, under a weak strengthening of the hypotheses, the convergence in L2 and [μ]−a.e. of the solution function to an object known from the problem. Then it is derived an inequality on the committed error and a numerical algorithm is deducted. Successively, this method is tested firstly on simulated cases and then on real datasets. The first real case is the dataset Growth, on the growth curves of 93 children; the analises are conducted both on aligned and misaligned curves, in order to obtain a better clusterization with respect to standard methods and some aspects already found by evolutionists are observed. Finally, after a further extension of this method to the case of vector of functions, it is used to a study, even supervised, of the geometry of the internal carotid of 65 patients.