Minority Class Feature Selection through Semi-Supervised Deep Sparse Autoencoders


Statistical learning
Health Analytics
Minority Class Feature Selection through Semi-Supervised Deep Sparse Autoencoders
Wednesday 16th October 2019
Massi, M.C.; Ieva, F.; Gasperoni, F.; Paganoni, A.M.
Class imbalance is a common issue in many domain applications of learning algorithms. Oftentimes, in the same domains it is much more relevant to correctly classify and profile minority class examples with respect to majority class ones. To solve classification problems in imbalanced settings, and improve accuracy specifically on the minority class, we propose a feature selection algorithm based on the application of a Deep Sparse AutoEncoder (DSAE) as a semi-supervised outlier detection method, where minority class examples are considered outliers of the normal population of majority class observations. We use a DSAE trained only on normal observations to reconstruct both inliers and outliers. From the analysis of the Reconstruction Error (RE) on both classes, we determine in which features the minority class has a significantly different distribution of values with respect to the majority class, thus identifying the most relevant features to discriminate between the two classes. We proved the efficacy of our algorithm in improving minority class classification accuracy (evaluated on specificity and AUROC metrics) on different datasets of high dimensionality and varying sample size, outperforming other benchmark feature selection methods.