Data Mining Application to Healthcare Fraud Detection: A Two-Step Unsupervised Clustering Model for Outlier Detection with Administrative Databases

Keywords

Health Analytics
Code:
49/2018
Title:
Data Mining Application to Healthcare Fraud Detection: A Two-Step Unsupervised Clustering Model for Outlier Detection with Administrative Databases
Date:
Tuesday 25th September 2018
Author(s):
Massi, M.C.; Ieva, F.; Lettieri, E.
Download link:
Abstract:
Being the recipient for huge public and private investments, the healthcare sector results to be an interesting target for fraudsters. Nowadays, the availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques. This approach can provide more efficient control of processes in terms of costs and time compared to manual audits. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals. In particular, it is focused on the DRG upcoding practice, i.e. the tendency of coding within Hospital Discharge Charts (HDC) in Administrative Databases, codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class. The model here proposed is constituted by two steps: one first step entails the clustering of providers according to their characteristics and behavior in the treatment of a specific disease, in order to spot outliers within this groups of peers; in the second step, a cross-validation is performed. This second phase is useful for controllers to verify whether within the list of suspects identified in the first step, any hospital exists, which may be justified in its outlierness by its particular characteristics, or by the treatment of a more complex patients' base. The proposed model was tested on a database relative to HDC collected by Regione Lombardia (Italy) in a time period of three years (2013-2015), focusing on the treatment of heart failure.
This report, or a modified version of it, has been also submitted to, or published on
BMC Medical Informatics and Decision Making, 20 (1): 1-11