期刊论文详细信息
Journal of Big Data
Explainable machine learning models for Medicare fraud detection
Research
John T. Hancock1  Richard A. Bauder1  Taghi M. Khoshgoftaar1  Huanjing Wang2 
[1] College of Engineering and Computer Science, Florida Atlantic University, Boca Raton, USA;Ogden College of Science and Engineering, Western Kentucky University, Bowling Green, USA;
关键词: Big Data;    Class imbalance;    Explainable machine learning models;    Ensemble supervised feature selection;    Medicare fraud detection;   
DOI  :  10.1186/s40537-023-00821-5
 received in 2023-06-17, accepted in 2023-08-30,  发布年份 2023
来源: Springer
PDF
【 摘 要 】

As a means of building explainable machine learning models for Big Data, we apply a novel ensemble supervised feature selection technique. The technique is applied to publicly available insurance claims data from the United States public health insurance program, Medicare. We approach Medicare insurance fraud detection as a supervised machine learning task of anomaly detection through the classification of highly imbalanced Big Data. Our objectives for feature selection are to increase efficiency in model training, and to develop more explainable machine learning models for fraud detection. Using two Big Data datasets derived from two different sources of insurance claims data, we demonstrate how our feature selection technique reduces the dimensionality of the datasets by approximately 87.5% without compromising performance. Moreover, the reduction in dimensionality results in machine learning models that are easier to explain, and less prone to overfitting. Therefore, our primary contribution of the exposition of our novel feature selection technique leads to a further contribution to the application domain of automated Medicare insurance fraud detection. We utilize our feature selection technique to provide an explanation of our fraud detection models in terms of the definitions of the selected features. The ensemble supervised feature selection technique we present is flexible in that any collection of machine learning algorithms that maintain a list of feature importance values may be used. Therefore, researchers may easily employ variations of the technique we present.

【 授权许可】

CC BY   
© Springer Nature Switzerland AG 2023

【 预 览 】
附件列表
Files Size Format View
RO202311104998621ZK.pdf 1112KB PDF download
Fig. 2 234KB Image download
Fig. 1 1997KB Image download
Fig. 1 357KB Image download
Fig. 6 4844KB Image download
Fig. 2 78KB Image download
Fig. 2 2049KB Image download
Fig. 2 826KB Image download
Fig. 3 1017KB Image download
Fig. 1 300KB Image download
Fig. 1 171KB Image download
Fig. 2 58KB Image download
Fig. 2 358KB Image download
12936_2017_2045_Article_IEq18.gif 1KB Image download
Fig. 4 1866KB Image download
【 图 表 】

Fig. 4

12936_2017_2045_Article_IEq18.gif

Fig. 2

Fig. 2

Fig. 1

Fig. 1

Fig. 3

Fig. 2

Fig. 2

Fig. 2

Fig. 6

Fig. 1

Fig. 1

Fig. 2

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  文献评价指标  
  下载次数:2次 浏览次数:0次