Data-Driven Approaches to Fraud Detection in Health Insurance Claims: A Systematic Review of Medical and Pharmaceutical Services
DOI:
https://doi.org/10.61978/medicor.v4i2.1352Keywords:
health insurance fraud detection, machine learning approaches, anomaly detection, healthcare claims analytics, medical and pharmaceutical services, systematic literature reviewAbstract
Fraud in health insurance claims continues to impose significant financial and operational burdens on healthcare systems, especially as the volume and complexity of claims increase. Conventional rule-based detection mechanisms, although widely used, have limited adaptability to evolving fraud patterns and high-dimensional data environments. This limitation has driven a shift toward data-driven analytical approaches capable of identifying suspicious patterns more effectively. This systematic review synthesizes peer-reviewed, open-access studies published between 2020 and 2025 that applied rule-based, supervised, unsupervised, or hybrid methods for fraud detection in health insurance claims. A comprehensive search across major databases yielded fourteen eligible studies representing diverse systems, datasets, and methodological designs. The findings indicate a clear transition from traditional rule-based systems to machine learning approaches, particularly in addressing challenges such as label scarcity, class imbalance, and complex fraud patterns. Most studies focused on integrated medical claims, where pharmaceutical fraud was embedded rather than analyzed independently, highlighting a gap in service-specific research. Significant heterogeneity was observed in fraud definitions, preprocessing techniques, labeling strategies, and evaluation metrics, limiting cross-study comparability and emphasizing the need for greater methodological transparency. Across the literature, data-driven approaches are consistently positioned as decision-support tools rather than definitive solutions, reinforcing their role in complementing expert judgment and regulatory oversight. Overall, effective implementation requires context-aware design, reliable labeling, and rigorous real-world validation. Future research should prioritize domain-specific analyses, particularly in pharmaceutical fraud, and improve transparency to support scalable and responsible deployment.
References
West J, Bhattacharya M, Islam R. Intelligent Financial Fraud Detection Practices: An Investigation. In: Tian J, Jing J, Srivatsa M, editors. International Conference on Security and Privacy in Communication Networks. Springer; 2015. p. 186–203. (Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering). doi:10.1007/978-3-319-23802-9_16 DOI: https://doi.org/10.1007/978-3-319-23802-9_16
Ngai EWT, Hu Y, Wong YH, Chen Y, Sun X. The Application of Data Mining Techniques in Financial Fraud Detection. Decis Support Syst. 2011;50(3):559–69. doi:10.1016/j.dss.2010.08.006 DOI: https://doi.org/10.1016/j.dss.2010.08.006
Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N. A Survey on Addressing High-Class Imbalance in Big Data. J Big Data. 2018;5(1):42. doi:10.1186/s40537-018-0151-6 DOI: https://doi.org/10.1186/s40537-018-0151-6
Agarwal A, Nene MJ. A five-layer framework for AI governance: integrating regulation, standards, and certification. Transforming Government: People, Process and Policy. 2025. doi:10.1108/TG-03-2025-0065 DOI: https://doi.org/10.1108/TG-03-2025-0065
Hamid Z, Khalique F, Mahmood S, Daud A, Bukhari A, Alshemaimri B. Healthcare Insurance Fraud Detection Using Data Mining. BMC Med Inform Decis Mak. 2024;24(1):112. doi:10.1186/s12911-024-02512-4 DOI: https://doi.org/10.1186/s12911-024-02512-4
Almuzaini T, Choonara I, Sammons H. Substandard and Counterfeit Medicines: A Systematic Review of the Literature. BMJ Open. 2013;3(8):e002923. doi:10.1136/bmjopen-2013-002923 DOI: https://doi.org/10.1136/bmjopen-2013-002923
Ozawa S, Evans DR, Bessias S, Haynie DG, Yemeke TT, Laing SK, et al. Prevalence and Estimated Economic Burden of Substandard and Falsified Medicines in Low- and Middle-Income Countries. JAMA Netw Open. 2018;1(4):e181662. doi:10.1001/jamanetworkopen.2018.1662 DOI: https://doi.org/10.1001/jamanetworkopen.2018.1662
World Health Organization. WHO Global Surveillance and Monitoring System for Substandard and Falsified Medical Products [Internet]. 2017. Available from: https://iris.who.int/bitstream/handle/10665/339295/WHO-EMP-RHT-SAV-2017.01-eng.pdf
Agarwal. Detection of Fraudulent Activities in Health Insurance Using Heterogeneous Information Network. Comput Intell Neurosci. 2023;2706928. doi:10.1155/2023/2706928
Khanizadeh F, Ettefaghian A, Wilson G, Shirazibeheshti A, Radwan T, Luca C. Smart Data-Driven Medical Decisions Through Collective and Individual Anomaly Detection in Healthcare Time Series. Int J Med Inform. 2025;194. doi:10.1016/j.ijmedinf.2024.105696 DOI: https://doi.org/10.1016/j.ijmedinf.2024.105696
Matloob I, Khan SA, Rukaiya R, Khattak MAK, Munir A. A Sequence Mining-Based Novel Architecture for Detecting Fraudulent Transactions in Healthcare Systems. IEEE Access. 2022;10:48447–63. doi:10.1109/ACCESS.2022.3170888 DOI: https://doi.org/10.1109/ACCESS.2022.3170888
Page MJ, McKenzie JE, Bossuyt PM, others. The PRISMA 2020 Statement. Syst Rev. 2021;10(1):89. doi:10.1186/s13643-021-01626-4 DOI: https://doi.org/10.1186/s13643-021-01626-4
Massi MC, Ieva F, Lettieri E. Data Mining Application to Healthcare Fraud Detection. BMC Med Inform Decis Mak. 2020;20(1):160. doi:10.1186/s12911-020-01143-9 DOI: https://doi.org/10.1186/s12911-020-01143-9
Shekhar S, Leder-Luis J, Akoglu L. Unsupervised Machine Learning for Explainable Health Care Fraud Detection. ArXiv. 2023. doi:10.48550/arXiv.2211.02927 DOI: https://doi.org/10.3386/w30946
Razzaq K, Shah M. Next-Generation Machine Learning in Healthcare Fraud Detection. Information. 2025;16(9):730. doi:10.3390/info16090730 DOI: https://doi.org/10.3390/info16090730
Von Elm E et al. The STROBE Statement: Guidelines for Reporting Observational Studies. PLoS Med. 2007;4(10):e296. doi:10.1371/journal.pmed.0040296 DOI: https://doi.org/10.1371/journal.pmed.0040296
Nabrawi E, Alanazi A. Fraud Detection in Healthcare Insurance Claims Using Machine Learning. Risks. 2023;11(9):160. doi:10.3390/risks11090160 DOI: https://doi.org/10.3390/risks11090160
Muspratt R, Mammadov M. Anomaly Detection with Sub-Extreme Values: Health Provider Billing. Data Sci Eng. 2024;9(1):62–72. doi:10.1007/s41019-023-00234-7 DOI: https://doi.org/10.1007/s41019-023-00234-7
Tjoa E, Guan C. A Survey on Explainable Artificial Intelligence (XAI). IEEE Trans Neural Netw Learn Syst. 2021;32(11):4793–813. doi:10.1109/TNNLS.2020.3027314 DOI: https://doi.org/10.1109/TNNLS.2020.3027314
Sabic E, Keeley D, Henderson B, Nannemann S. Healthcare and Anomaly Detection: Using Machine Learning to Predict Anomalies in Heart Rate Data. AI Soc. 2021;36(1):149–58. doi:10.1007/s00146-020-00985-1 DOI: https://doi.org/10.1007/s00146-020-00985-1
Kotekani SS, Velchamy I. An Effective Data Sampling Procedure for Imbalanced Data Learning on Health Insurance Fraud Detection. Journal of Computing and Information Technology. 2020;28(4):269–85. doi:10.20532/cit.2020.1005216 DOI: https://doi.org/10.20532/cit.2020.1005216
Agarwal. An Intelligent Machine Learning Approach for Fraud Detection in Medical Claim Insurance: A Comprehensive Study. SJET. 2023;11(9):191–200. doi:10.36347/sjet.2023.v11i09.003 DOI: https://doi.org/10.36347/sjet.2023.v11i09.003
Lu et al. Health Insurance Fraud Detection Using an Attributed Heterogeneous Information Network with a Hierarchical Attention Mechanism. BMC Med Inform Decis Mak. 2023;23:62. doi:10.1186/s12911-023-02152-0 DOI: https://doi.org/10.1186/s12911-023-02152-0
Curtis ED, Billion-Polak P, Khoshgoftaar TM, Furht B. A Review of Distinct Machine Learning Classifiers for Healthcare Fraud Detection. J Big Data. 2025;12(1):238. doi:10.1186/s40537-025-01295-3 DOI: https://doi.org/10.1186/s40537-025-01295-3
Walauskis MA, Khoshgoftaar TM. Unsupervised Label Generation for Severely Imbalanced Fraud Data. J Big Data. 2025;12(1):63. doi:10.1186/s40537-025-01120-x DOI: https://doi.org/10.1186/s40537-025-01120-x
Kennedy RKL, Villanustre F, Khoshgoftaar TM. Unsupervised Feature Selection and Class Labeling for Credit Card Fraud. J Big Data. 2025;12(1):111. doi:10.1186/s40537-025-01154-1 DOI: https://doi.org/10.1186/s40537-025-01154-1
Fryze I, Naughton BD. Substandard and Falsified Medicine Recalls in the Legitimate Supply Chain: A Systematic Review of Evidence. BMJ Open. 2025;15(10):e103672. doi:10.1136/bmjopen-2025-103672 DOI: https://doi.org/10.1136/bmjopen-2025-103672
McManus D, Naughton BD. A Systematic Review of Substandard, Falsified, Unlicensed and Unregistered Medicine Sampling Studies. BMJ Glob Health. 2020;5(8):e002393. doi:10.1136/bmjgh-2020-002393 DOI: https://doi.org/10.1136/bmjgh-2020-002393
Kumar R, Sporn K, Waisberg E, Ong J, Paladugu P, Vadhera AS, et al. Navigating Healthcare AI Governance: The Comprehensive Algorithmic Oversight and Stewardship Framework for Risk and Equity. Health Care Analysis. 2025. doi:10.1007/s10728-025-00537-y DOI: https://doi.org/10.1007/s10728-025-00537-y
Matloob I, Khan S, ur Rahman H, Hussain F. Medical Health Benefit Management System for Real-Time Notification of Fraud Using Historical Medical Records. Applied Sciences. 2020;10(15):5144. doi:10.3390/app10155144 DOI: https://doi.org/10.3390/app10155144
Sunilram C, Bhavana D, Abhisriraj K, Gayatri M. Identifying Health Insurance Claim Frauds Using Machine Learning Concepts. International Journal of Engineering Research and Science & Technology. 2024;20(1):223–8.
Aarthi V, Raghavendra VS, Rao VD, Birudaraju H. Leveraging Machine Learning for Improved Detection of Medicare Fraud. International Journal of Scientific Research in Engineering and Management. 2025;09(06):1–8. doi:10.55041/IJSREM.NCFT031 DOI: https://doi.org/10.55041/IJSREM.NCFT031
Anwer S, Faisal F, Qureshi MA. A Comprehensive Study of Healthcare Fraud Detection Using Machine Learning. International Journal of Advanced Research in Science, Communication and Technology. 2024;5(1):20–5. doi:10.5281/zenodo.10473530
Subbarayudu Y, Vijendar Reddy G, Sandhya M, Bhargavi J, Abhilash PK, Pushkarna G. Evaluation of Distributed Topic Modeling Paradigms for Detection of Fraudulent Insurance Claims. In: MATEC Web of Conferences. 2024. p. 1111. doi:10.1051/matecconf/202439201111 DOI: https://doi.org/10.1051/matecconf/202439201111
Wang Z, Chen X, Wu Y, Jiang L, Lin S, Qiu G. A Robust and Interpretable Ensemble Machine Learning Model for Predicting Healthcare Insurance Fraud. Sci Rep. 2025;15(1):218. doi:10.1038/s41598-024-82062-x DOI: https://doi.org/10.1038/s41598-024-82062-x
Cherkaoui O, Anoun H, Maizate A. A Benchmark of Health Insurance Fraud Detection Using Machine Learning Techniques. IAES International Journal of Artificial Intelligence. 2024;13(2):1925–34. DOI: https://doi.org/10.11591/ijai.v13.i2.pp1925-1934
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Medicor : Journal of Health Informatics and Health Policy

This work is licensed under a Creative Commons Attribution 4.0 International License.



