Machine Learning Approach for Iron Deficiency Anemia Detection in Pregnant Women Using XGBoost and CTGAN
DOI:
https://doi.org/10.61978/medicor.v4i1.1097Keywords:
Iron Deficiency Anemia (IDA), Machine Learning, XGBoost, CTGAN, Pregnant Women, Artificial Intelligence, Maternal HealthAbstract
Iron deficiency anemia (IDA) continues to pose one of the most significant challenges in maternal health, affecting nearly 40% of pregnant women worldwide according to the World Health Organization (2023). Despite advances in obstetric screening, conventional diagnostic methods such as complete blood count (CBC) tests often fail to detect early or latent stages of anemia due to physiological changes associated with pregnancy. This study introduces a robust machine learning framework integrating Extreme Gradient Boosting (XGBoost), and Conditional Tabular Generative Adversarial Networks (CTGAN) for the early detection of IDA in pregnant women. Our approach addresses the class imbalance inherent in clinical datasets and incorporates trimester-specific hematological adaptations. Using 3,944 anonymized clinical records from ASA Hospital Sarajevo (January–July 2025), we evaluated model performance across hematological features commonly used in obstetric care. The optimized model achieved a precision of 100%, recall of 65.2%, specificity of 100%, and an AUC-ROC of 0.8686. Comparative analysis against conventional CBC screening, which reached only 40.5% sensitivity, demonstrated significant improvement in detection reliability. These findings demonstrate the potential of AI-enhanced diagnostics to support early detection of IDA in pregnant women, reduce missed diagnoses, and strengthen clinical decision-making. Further multi-center validation and integration of additional biomarkers are recommended to confirm generalizability.
References
Abdul-Jabbar, S. S., Farhan, A. K., & Kandhro, A. H. (2025). Developing a hybrid machine learning algorithm for anemia diagnosis. Journal of Artificial Intelligence and Metaheuristics, 9(1), 20–33. https://doi.org/https://www.americaspg.com/articleinfo/28/show/3516 DOI: https://doi.org/10.54216/JAIM.090103
Al-Shehri, H., Nasser, A., & Alqahtani, M. (2024). Prevalence and determinants of anemia among pregnant women in the Middle East: A meta-analysis. BMC Pregnancy and Childbirth, 24(3), 421. https://doi.org/10.1186/s12884-024-06712-2
Bothwell, T. H. (2022). Iron requirements in pregnancy and strategies to prevent deficiency. American Journal of Clinical Nutrition, 116(4), 985–992.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324 DOI: https://doi.org/10.1023/A:1010933404324
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 DOI: https://doi.org/10.1145/2939672.2939785
Damkliang, K., Aunphoklang, S., Wongseree, W., Wansri, V., Wangkulangkul, P., Wiangnon, S., & Wansri, P. (2025). An AI-based decision support framework for clinical screening of iron deficiency anemia and thalassemia. IEEE Access, 13, 133937–133957. https://doi.org/10.1109/ACCESS.2024.3514789 DOI: https://doi.org/10.1109/ACCESS.2025.3592652
Darwish, H., Barakat, A., & Sabri, N. (2023). Hematological reference intervals during pregnancy: A systematic review and meta-analysis. Obstetrics & Gynecology Science, 66(1), 10–22.
Elmaleeh, M. A. A. (2024). The identification and categorization of anemia through artificial neural networks: A comparative analysis of three models. Electrical and Electronics Engineering: An International Journal, 13(1/2), 1–13. https://doi.org/10.5121/eeiej.2024.13201
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010 DOI: https://doi.org/10.1016/j.patrec.2005.10.010
Hassan, S., Ali, M. A., & Yousef, W. (2023). Machine learning applications in obstetrics: Current trends and future directions. Journal of Medical Systems, 47(5), 42–53.
Kreuter, D., Remmele, J., Giese, T., Schemmer, P., & Mehrabi, A. (2025). Artificial intelligence for pre-anaemic iron deficiency detection using rich complete blood count data. medRxiv. https://doi.org/10.1101/2025.06.18.2532949 DOI: https://doi.org/10.1101/2025.06.18.25329494
Kumar, R., Singh, N., & Arora, M. (2023). Synthetic data generation for healthcare AI: Addressing class imbalance through generative models. Journal of Biomedical Informatics, 145, 104392. https://doi.org/10.1016/j.jbi.2023.104392
Li, Y., Sun, F., & Huang, W. (2024). Enhancing diagnostic precision in obstetric anemia through gradient boosting and feature interpretability. *Computers in Biology And.
McLean, E., Cogswell, M., Egli, I., Wojdyla, D., & Benoist, B. (2015). Worldwide prevalence of anemia according to WHO estimates. Public Health Nutrition, 18(3), 444–454.
Miller, T., Howe, P., & Lee, C. (2024). Explainable artificial intelligence in obstetric diagnostics: Balancing transparency and accuracy. AI in Medicine, 152, 102512. https://doi.org/10.1016/j.artmed.2024.102512
Rahman, M. F., Zaman, T., & Alam, N. (2023). Deep learning models for anemia classification in pregnancy using electronic health records. Frontiers in Artificial Intelligence, 6, 118429. https://doi.org/10.3389/frai.2023.118429
Rasmussen, K. M., & Stoltzfus, R. J. (2020). Deficiencies in iron and folate during pregnancy: Maternal and fetal health consequences. Annual Review of Nutrition, 40, 109–136.
Saputra, D. C. E., Sunat, K., & Ratnaingsih, T. (2023). A new artificial intelligence approach using extreme learning machine as the potentially effective model to predict and analyze the diagnosis of anemia. Healthcare, 11(5). https://doi.org/10.3390/healthcare11050697 DOI: https://doi.org/10.3390/healthcare11050697
Shill, K. B. (2021). Iron deficiency anemia: Clinical implications and diagnostic challenges in pregnancy. Clinical Hematology International, 3(4), 176–185.
Singh, A., Verma, P., & Kaur, R. (2024). Advancements in data augmentation techniques for clinical prediction models. Artificial Intelligence in Medicine, 145, 102692.
Smith, L. R., Alonzo, J., & Patel, R. (2023). Maternal anemia and its global health burden: A systematic review and policy analysis. Lancet Global Health, 11(9), 1302– 1314. https://doi.org/10.1016/S2214-109X(23)00212-8
WHO. (2011). Haemoglobin concentrations for the diagnosis of anaemia and assessment of severity (Vitamin and Mineral Nutrition Information System. https://apps.who.int/iris/handle/10665/85839
WHO. (2023). Global report on anemia 2023. WHO Press. https://doi.org/https://www.who.int/publications/i/item/9789240078754
Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling tabular data using conditional GAN. In Advances in neural information processing systems 32 (pp. 7335–7345). https://proceedings.neurips.cc/paper/2019/hash/254ed7d2de3b23ab10936522dd547b78-Abstract.html
Yadav, S., & Shukla, S. (2021). Handling imbalanced datasets in classification: A review. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1658–1676.
Zhang, Z. (2016). Missing data imputation: Focusing on single imputation. Annals of Translational Medicine, 4(1). https://doi.org/10.3978/j.issn.2305-5839.2015.12.38 DOI: https://doi.org/10.21037/atm.2016.03.36
Zhou, P., Liu, T., & Ren, X. (2024). Hybrid ensemble models combining tree-based and neural architectures for medical diagnosis. *Expert Systems with Applications, 247, 123859. https://doi.org/10.1016/j.eswa.2024.123859



