Early Prediction of At Risk Students Using Minimal Data: A Machine Learning Framework for Higher Education
DOI:
https://doi.org/10.61978/digitus.v3i2.953Keywords:
Early Warning Systems, Academic Risk Prediction, Learning Analytics, Machine Learning, CatBoost, LMS Data, Student RetentionAbstract
Early identification of academically at risk students is essential for timely intervention and improved retention in higher education. This study investigates the effectiveness of using pre admission and early semester LMS data to predict student risk using machine learning models. The objective is to assess whether limited, readily available data from the first four weeks of instruction can reliably support early warning systems. A supervised learning framework was applied using the Open University Learning Analytics Dataset (OULAD), with features derived from student demographics and early LMS activity logs. Models evaluated include Logistic Regression, XGBoost, and CatBoost, with time based validation and SMOTE employed to address class imbalance. Model performance was measured using ROC AUC, F1 Score, and Recall. The CatBoost model achieved the best performance, with an F1 score of 0.770 and ROC AUC of 0.750, significantly outperforming baseline models. Quiz submission behavior, login frequency, and pre admission qualification level emerged as the most predictive features. Results also revealed a steady week by week improvement in model accuracy, confirming the increasing value of LMS engagement data over time. These findings affirm that early stage student data can be used effectively to predict academic risk, enabling institutions to act before major assessments are conducted. The study emphasizes the need for institutional readiness, ethical implementation, and inclusive practices in deploying predictive tools. Future research should expand the feature space and test cross institutional generalizability to refine early warning systems further.
References
Ajuwon, O. A., Animashaun, E. S., & Chiekezie, N. R. (2024). Crisis Intervention, Mediation, Counseling, and Mentoring in Schools: Building Resilient Educational Communities. International Journal of Applied Research in Social Sciences, 6(8), 1593–1611. https://doi.org/10.51594/ijarss.v6i8.1372 DOI: https://doi.org/10.51594/ijarss.v6i8.1372
Almodiel, M. C. (2021). Assessing Online Learners’ Access Patterns and Performance Using Data Mining Techniques. International Journal in Information Technology in Governance Education and Business, 3(1), 46–56. https://doi.org/10.32664/ijitgeb.v3i1.87 DOI: https://doi.org/10.32664/ijitgeb.v3i1.87
Alt, A. (2019). The Impact of Social Belonging Interventions on Student Retention and Persistence in College. https://doi.org/10.3102/1440862 DOI: https://doi.org/10.3102/1440862
Ameri, S., Fard, M. J., Chinnam, R. B., & Reddy, C. K. (2016). Survival Analysis Based Framework for Early Prediction of Student Dropouts. 903–912. https://doi.org/10.1145/2983323.2983351 DOI: https://doi.org/10.1145/2983323.2983351
Ara, S., & Tanuja, R. (2024). Exploring Key Parameters Influencing Student Performance in a Blended Learning Environment Using Learning Analytics. Journal of Education and E-Learning Research, 11(1), 77–89. https://doi.org/10.20448/jeelr.v11i1.5330 DOI: https://doi.org/10.20448/jeelr.v11i1.5330
Berkeley, S., Scanlon, D., Bailey, T. R., Sutton, J. C., & Sacco, D. (2020). A Snapshot of RTI Implementation a Decade Later: New Picture, Same Story. Journal of Learning Disabilities, 53(5), 332–342. https://doi.org/10.1177/0022219420915867 DOI: https://doi.org/10.1177/0022219420915867
Davis, G. M., Hanzsek-Brill, M. B., Petzold, M. C., & Robinson, D. (2019). Students’ Sense of Belonging: The Development of a Predictive Retention Model. Journal of the Scholarship of Teaching and Learning, 19(1). https://doi.org/10.14434/josotl.v19i1.26787 DOI: https://doi.org/10.14434/josotl.v19i1.26787
Draganov, T., Kim, J., & Yoon, S. W. (2023). Increasing Retention of Underrepresented Students in STEM Fields at California Community Colleges: A Study of the STEM2 Program. Journal of College Student Retention Research Theory & Practice, 26(4), 1147–1164. https://doi.org/10.1177/15210251221149648 DOI: https://doi.org/10.1177/15210251221149648
Frontistis, Z., Lykogiannis, G., & Sarmpanis, A. (2023). Machine Learning Implementation in Membrane Bioreactor Systems: Progress, Challenges, and Future Perspectives: A Review. Environments, 10(7), 127. https://doi.org/10.3390/environments10070127 DOI: https://doi.org/10.3390/environments10070127
Gnoh, H. Q., Keoy, K. H., Iqbal, J., Anjum, S. S., Yeo, S. F., Lim, A.-F., Lim, W., & Chaw, L. Y. (2024). Enhancing Business Sustainability Through Technology-Enabled AI: Forecasting Student Data and Comparing Prediction Models for Higher Education Institutions (HEIs). PaperASIA, 40(2b), 48–58. https://doi.org/10.59953/paperasia.v40i2b.86 DOI: https://doi.org/10.59953/paperasia.v40i2b.86
Herodotou, C., Naydenova, G., Boroowa, A., Gilmour, A., & Rienties, B. (2020). How Can Predictive Learning Analytics and Motivational Interventions Increase Student Retention and Enhance Administrative Support in Distance Education? Journal of Learning Analytics, 7(2). https://doi.org/10.18608/jla.2020.72.4 DOI: https://doi.org/10.18608/jla.2020.72.4
Herodotou, C., Rienties, B., Boroowa, A., Zdráhal, Z., & Hlosta, M. (2019). A Large-Scale Implementation of Predictive Learning Analytics in Higher Education: The Teachers’ Role and Perspective. Educational Technology Research and Development, 67(5), 1273–1306. https://doi.org/10.1007/s11423-019-09685-0 DOI: https://doi.org/10.1007/s11423-019-09685-0
Imran, A., Li, J., & Alshammari, A. (2025). AI-driven Educational Transformation in ICT: Improving Adaptability, Sentiment, and Academic Performance With Advanced Machine Learning. Plos One, 20(5), e0317519. https://doi.org/10.1371/journal.pone.0317519 DOI: https://doi.org/10.1371/journal.pone.0317519
Lane, T. B. (2016). Beyond Academic and Social Integration: Understanding the Impact of a STEM Enrichment Program on the Retention and Degree Attainment of Underrepresented Students. Cbe—Life Sciences Education, 15(3), ar39. https://doi.org/10.1187/cbe.16-01-0070 DOI: https://doi.org/10.1187/cbe.16-01-0070
Lawson, J. L., O’Dwyer, L. M., Dearing, E., Raczek, A. E., Foley, C., Khanani, N., Walsh, M. E., & Leigh, Y. R. (2024). Estimating the Impact of Integrated Student Support on Elementary School Achievement: A Natural Experiment. Aera Open, 10. https://doi.org/10.1177/23328584241292072 DOI: https://doi.org/10.1177/23328584241292072
Linden, K. (2021). Improving Student Retention by Providing Targeted Support to University Students Who Do Not Submit an Early Assessment Item. Student Success, 12(3). https://doi.org/10.5204/ssj.2152 DOI: https://doi.org/10.5204/ssj.2152
Lozada, N., Pérez, J. E. A., & Henao-García, E. A. (2023). Unveiling the Effects of Big Data Analytics Capability on Innovation Capability Through Absorptive Capacity: Why More and Better Insights Matter. Journal of Enterprise Information Management. https://doi.org/10.1108/jeim-02-2021-0092 DOI: https://doi.org/10.1108/JEIM-02-2021-0092
Melton, C., Power, M. E., Moore, T. D., Plumb, A., Bourget, J., Coyne, M. D., & Simonsen, B. (2024). A Four-Step Plan to Integrate Behavioral Practices Into Tier 1 Foundational Reading Instruction With an Integrated Lesson Plan Template. Intervention in School and Clinic, 60(1), 6–16. https://doi.org/10.1177/10534512241247556 DOI: https://doi.org/10.1177/10534512241247556
Mozahem, N. A. (2020). Using Learning Management System Activity Data to Predict Student Performance in Face-to-Face Courses. International Journal of Mobile and Blended Learning, 12(3), 20–31. https://doi.org/10.4018/ijmbl.2020070102 DOI: https://doi.org/10.4018/IJMBL.2020070102
Murumba, J. W., & Alari, J. O. (2024). An Evaluation of Academic Integrity and Sustainable Quality Education in Higher Learning Institutions in Kenya: Students’ Perspectives. Kabarak J. Res. Innov., 13(4), 81–94. https://doi.org/10.58216/kjri.v13i4.249 DOI: https://doi.org/10.58216/kjri.v13i4.249
Mwalumbwe, I., & Mtebe, J. S. (2017). Using Learning Analytics to Predict Students’ Performance in Moodle Learning Management System: A Case of Mbeya University of Science and Technology. The Electronic Journal of Information Systems in Developing Countries, 79(1), 1–13. https://doi.org/10.1002/j.1681-4835.2017.tb00577.x DOI: https://doi.org/10.1002/j.1681-4835.2017.tb00577.x
Pearson, J., Giacumo, L. A., Farid, A., & Sadegh, M. (2022). A Systematic Multiple Studies Review of Low-Income, First-Generation, and Underrepresented, STEM-Degree Support Programs: Emerging Evidence-Based Models and Recommendations. Education Sciences, 12(5), 333. https://doi.org/10.3390/educsci12050333 DOI: https://doi.org/10.3390/educsci12050333
Pletzen, E. v., Sithaldeen, R., Fontaine-Rainen, D., Bam, M., Shong, C. L., Charitar, D., Dlulani, S., Sebothoma, J., & Sebothoma, D. (2021). Conceptualisation and Early Implementation of an Academic Advising System at the University of Cape Town. Journal of Student Affairs in Africa, 9(2), 31–45. https://doi.org/10.24085/jsaa.v9i2.3688 DOI: https://doi.org/10.24085/jsaa.v9i2.3688
RUPADEVI, R. (2025). Prediction of at-Risk Students in E-Learning Platforms Using Deep Learning Models. International Scientific Journal of Engineering and Management, 04(04), 1–7. https://doi.org/10.55041/isjem03267 DOI: https://doi.org/10.55041/ISJEM03267
Sage, A. J., Cervato, C., Genschel, U., & Ogilvie, C. A. (2018). Combining Academics and Social Engagement: A Major-Specific Early Alert Method to Counter Student Attrition in Science, Technology, Engineering, and Mathematics. Journal of College Student Retention Research Theory & Practice, 22(4), 611–626. https://doi.org/10.1177/1521025118780502 DOI: https://doi.org/10.1177/1521025118780502
Salibo, M. (2025). Development of Emergency Educational Leadership Scale for School Heads. Pemj, 39(5), 656–677. https://doi.org/10.70838/pemj.390508 DOI: https://doi.org/10.70838/pemj.390508
Santiago, R. T., Hall, G. J., Garbacz, S. A., Gulbrandson, K., & Albers, C. A. (2024). Examining an Integrated Factor Structure of Schoolwide MTSS Implementation Measures. Journal of Positive Behavior Interventions, 27(1), 39–49. https://doi.org/10.1177/10983007241249524 DOI: https://doi.org/10.1177/10983007241249524
Shein, W. H. (2022). Split Sample Sequential Fences Based on Bootstrap Cut Off Points for Identifying Outliers and Parameter Estimations. Asm Science Journal, 17, 1–17. https://doi.org/10.32802/asmscj.2022.500 DOI: https://doi.org/10.32802/asmscj.2022.500
Wang, Z., Feng, X., Tang, J., Huang, G. Y., & Liu, Z. (2019). Deep Knowledge Tracing With Side Information. 303–308. https://doi.org/10.1007/978-3-030-23207-8_56 DOI: https://doi.org/10.1007/978-3-030-23207-8_56
Zeng, Y., Núñez, A., & Li, Z. (2023). Incorporating Modal Testing Into Dynamic Load Identification From Structural Vibration Measurement. https://doi.org/10.12783/shm2023/37069 DOI: https://doi.org/10.12783/shm2023/37069


