期刊论文详细信息
Kuwait Journal of Science
Oversampling based on generative adversarial networks to overcome imbalance data in predicting fraud insurance claim
article
Ranu Agastya Nugraha1  Hilman Ferdinandus Pardede1  Agus Subekti2 
[1] Faculty of Information Technology, Graduate School of Computer Science, Nusa Mandiri University;National Research and Innovation Agency
关键词: Fraud insurance detection;    generative adversarial networks;    imbalance data;    oversampling;    tabular GAN;   
DOI  :  10.48129/kjs.splml.19119
学科分类:社会科学、人文和艺术(综合)
来源: Kuwait University * Academic Publication Council
PDF
【 摘 要 】

Fraud on health insurance has an impact not only on cost overruns, but also a decline in the quality of health services in long term. The use of machine learning to predict fraud on health insurance is increasingly popular. However, one of remaining problems for predicting health insurance frauds is the data imbalance. The problem of data imbalance would affect machine learning capabilities which tend to be biased towards the majority class. Recently, many efforts have been employed to use deep learning for data augmentation. One of them is Generative Adversarial Networks (GAN). Studies show that GAN has the capability to generate artificial data very similar to real data. Unlike other deep learning structures, GAN trains two networks called generator and discriminator in adversarial training. By doing so, generator never sees the distribution of the real data, making it possible to learn better generative model to produce the artificial data. In this paper, we propose to use GAN as an oversampling method to generate data for minority class. Since data for detecting health insurance fraud are tabular, we adopt Conditional Tabular GAN (CTGAN) architecture where generator is conditioned to be able to adjust the tabular data input and receive additional information in order to produce samples according to the specified class conditions. The new balanced data are used to train 17 classification algorithms. Our experiments show that our proposed method achieves better performance on several evaluation metrics: accuracy, precision score, F1-score, and also ROC than other referenced methods to deal imbalance data random over sampling (ROS), random under sampling (RUS), Synthetic Minority Oversampling Technique (SMOTE), Borderline SMOTE (B-SMO), and adaptive synthetic (ADASYN) methods.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307010001258ZK.pdf 2570KB PDF download
  文献评价指标  
  下载次数:10次 浏览次数:0次