Enhancing Software Quality Through Automated Code Review Tools: An Empirical Synthesis Across CI/CD Pipelines
DOI:
https://doi.org/10.61978/digitus.v3i4.956Keywords:
Automated Code Review, Software Quality, Static Analysis, CI/CD, Developer Productivity, Copilot Autofix, SonarQubeAbstract
Automated Code Review Tools (ACRT) have become increasingly integral to modern software development workflows, particularly within continuous integration and deployment (CI/CD) environments. This study aims to evaluate the effectiveness of ACRT in improving software quality, accelerating vulnerability remediation, and enhancing developer productivity. Using a combination of empirical analysis, industry case studies, and academic benchmarks, we examine how tools such as SonarQube, CodeQL, Copilot Autofix, and secret scanners impact key quality metrics including defect density, Mean Time to Repair (MTTR), and pull request (PR) throughput. A quasi experimental design was employed using Interrupted Time Series (ITS) and Regression Discontinuity Design (RDD) to measure longitudinal outcomes across six open source and enterprise projects. Results indicate that defect density decreased by 15–30% following ACRT adoption, accompanied by notable improvements in security MTTR. For example, Copilot Autofix reduced XSS remediation times from 180 minutes to just 22 minutes, underscoring the tool’s potential for accelerating vulnerability management. PR throughput also increased by up to 40%. However, this efficiency gain coincided with a 20–30% decline in human code review interactions, highlighting a trade-off between automation benefits and the reduced depth of manual oversight. We conclude that ACRT tools, when integrated thoughtfully into development pipelines, can deliver measurable improvements in software quality and responsiveness. However, sustained benefits require careful tuning, contextual alerting, and a hybrid review strategy that maintains human involvement to preserve long term maintainability.
References
Alarcon, G. M., Walter, C., Gibson, A., Gamble, R., Capiola, A., Jessup, S. A., & Ryan, T. J. (2020). Would You Fix This Code for Me? Effects of Repair Source and Commenting on Trust in Code Repair. Systems, 8(1), 8. https://doi.org/10.3390/systems8010008 DOI: https://doi.org/10.3390/systems8010008
Alvarez, M. J., & Miller, M. K. (2016). Counterfactual Thinking About Crime Control Theater: Mock Jurors’ Decision Making in an AMBER Alert Trial. Psychology Public Policy and Law, 22(4), 349–361. https://doi.org/10.1037/law0000098 DOI: https://doi.org/10.1037/law0000098
Belachew, E. B. (2018). Analysis of Software Quality Using Software Metrics. International Journal on Computational Science & Applications, 8(4/5), 11–20. https://doi.org/10.5121/ijcsa.2018.8502 DOI: https://doi.org/10.5121/ijcsa.2018.8502
Bertram, I., Hong, J., Huang, Y., Weimer, W., & Sharafi, Z. (2020). Trustworthiness Perceptions in Code Review. 1–6. https://doi.org/10.1145/3382494.3422164 DOI: https://doi.org/10.1145/3382494.3422164
Biase, M. d., Bruntink, M., & Bacchelli, A. (2016). A Security Perspective on Code Review: The Case of Chromium. https://doi.org/10.1109/scam.2016.30 DOI: https://doi.org/10.1109/SCAM.2016.30
Bitkina, O. V., & Park, J. (2021). Emotional State and Social Media Experience: A Pandemic Case Study. Sustainability, 13(23), 13311. https://doi.org/10.3390/su132313311 DOI: https://doi.org/10.3390/su132313311
Eisty, N. U. (2021). Developers Perception of Peer Code Review in Research Software Development. https://doi.org/10.48550/arxiv.2109.10971 DOI: https://doi.org/10.1007/s10664-021-10053-x
Erlenhov, L., Francisco Gomes de Oliveira Neto, & Leitner, P. (2020). An Empirical Study of Bots in Software Development: Characteristics and Challenges From a Practitioner’s Perspective. 445–455. https://doi.org/10.1145/3368089.3409680 DOI: https://doi.org/10.1145/3368089.3409680
Ford, D., Behroozi, M., Serebrenik, A., & Parnin, C. (2019). Beyond the Code Itself: How Programmers Really Look at Pull Requests. https://doi.org/10.1109/icse-seis.2019.00014 DOI: https://doi.org/10.1109/ICSE-SEIS.2019.00014
Fregnan, E., Petrulio, F., & Bacchelli, A. (2022). The Evolution of the Code During Review: An Investigation on Review Changes. Empirical Software Engineering, 27(7). https://doi.org/10.1007/s10664-022-10205-7 DOI: https://doi.org/10.1007/s10664-022-10205-7
Han, D., Ragkhitwetsagul, C., Krinke, J., Paixão, M., & Rosa, G. (2020). Does Code Review Really Remove Coding Convention Violations? 43–53. https://doi.org/10.1109/scam51674.2020.00010 DOI: https://doi.org/10.1109/SCAM51674.2020.00010
Iftikhar, U., Börstler, J., & Ali, N. b. (2023). On Potential Improvements in the Analysis of the Evolution of Themes in Code Review Comments. 340–347. https://doi.org/10.1109/seaa60479.2023.00059 DOI: https://doi.org/10.1109/SEAA60479.2023.00059
Knutas, A., Hynninen, T., & Hujala, M. (2021). To Get Good Student Ratings Should You Only Teach Programming Courses? Investigation and Implications of Student Evaluations of Teaching in a Software Engineering Context. 253–260. https://doi.org/10.1109/icse-seet52601.2021.00035 DOI: https://doi.org/10.1109/ICSE-SEET52601.2021.00035
Kovalenko, V., Tintarev, N., Pasynkov, E., Bird, C., & Bacchelli, A. (2020). Does Reviewer Recommendation Help Developers? Ieee Transactions on Software Engineering, 46(7), 710–731. https://doi.org/10.1109/tse.2018.2868367 DOI: https://doi.org/10.1109/TSE.2018.2868367
Long, J., Sampson, F., Coster, J., O’Hara, R., Bell, F., & Goodacre, S. (2024). How Do Emergency Departments Respond to Ambulance Pre-Alert Calls? A Qualitative Exploration of the Management of Pre-Alerts in UK Emergency Departments. Emergency Medicine Journal, 42(1), 28–34. https://doi.org/10.1136/emermed-2023-213854 DOI: https://doi.org/10.1136/emermed-2023-213854
Malloy, B. A., & Power, J. F. (2017). Quantifying the Transition From Python 2 to 3: An Empirical Study of Python Applications. https://doi.org/10.1109/esem.2017.45 DOI: https://doi.org/10.1109/ESEM.2017.45
Marginean, A., Bader, J., Chandra, S., Harman, M., Jia, Y., Mao, K., Mols, A., & Scott, A. (2019). SapFix: Automated End-to-End Repair at Scale. https://doi.org/10.1109/icse-seip.2019.00039 DOI: https://doi.org/10.1109/ICSE-SEIP.2019.00039
Melo, M. S., Menezes, G., & Cafeo, B. (2022). Exploring Pull Requests in Code Samples. https://doi.org/10.5753/vem.2022.226789 DOI: https://doi.org/10.5753/vem.2022.226789
Morris, M. E., Brusco, N. K., Jones, J., Taylor, N. F., East, C., Semciw, A. I., Edvardsson, K., Thwaites, C., Bourke, S. L., Khan, U. R., Fowler‐Davis, S., & Oldenburg, B. (2023). The Widening Gap Between the Digital Capability of the Care Workforce and Technology-Enabled Healthcare Delivery: A Nursing and Allied Health Analysis. Healthcare, 11(7), 994. https://doi.org/10.3390/healthcare11070994 DOI: https://doi.org/10.3390/healthcare11070994
Panichella, S., Panichella, A., Beller, M., Zaidman, A., & Gall, H. (2016). The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation. https://doi.org/10.7287/peerj.preprints.1467v3 DOI: https://doi.org/10.7287/peerj.preprints.1467v3
Pérez‐Castillo, R., & Piattini, M. (2018). An Empirical Study on How Project Context Impacts on Code Cloning. Journal of Software Evolution and Process, 30(12). https://doi.org/10.1002/smr.2115 DOI: https://doi.org/10.1002/smr.2115
Quintens, C., Rijdt, T. D., Nieuwenhuyse, T. V., Simoens, S., Peetermans, W., Bosch, B. V. d., Casteels, M., & Spriet, I. (2019). Development and Implementation of “Check of Medication Appropriateness” (CMA): Advanced Pharmacotherapy-Related Clinical Rules to Support Medication Surveillance. BMC Medical Informatics and Decision Making, 19(1). https://doi.org/10.1186/s12911-019-0748-5 DOI: https://doi.org/10.1186/s12911-019-0748-5
Rahman, S. u., Khan, M. A., & Iqbal, N. (2018). Motivations and Barriers to Purchasing Online: Understanding Consumer Responses. South Asian Journal of Business Studies, 7(1), 111–128. https://doi.org/10.1108/sajbs-11-2016-0088 DOI: https://doi.org/10.1108/SAJBS-11-2016-0088
Sharma, L. (2019). A Systematic Review of the Concept of Entrepreneurial Alertness. Journal of Entrepreneurship in Emerging Economies, 11(2), 217–233. https://doi.org/10.1108/jeee-05-2018-0049 DOI: https://doi.org/10.1108/JEEE-05-2018-0049
Shi, S., Li, M., Lo, D., Thung, F., & Huo, X. (2019). Automatic Code Review by Learning the Revision of Source Code. Proceedings of the Aaai Conference on Artificial Intelligence, 33(01), 4910–4917. https://doi.org/10.1609/aaai.v33i01.33014910 DOI: https://doi.org/10.1609/aaai.v33i01.33014910
Snyder, M. E., Jaynes, H. A., Gernant, S. A., Diiulio, J., Militello, L. G., Doucette, W. R., Adeoye‐Olatunde, O. A., & Russ, A. L. (2019). Alerts for Community Pharmacist-Provided Medication Therapy Management: Recommendations From a Heuristic Evaluation. BMC Medical Informatics and Decision Making, 19(1). https://doi.org/10.1186/s12911-019-0866-0 DOI: https://doi.org/10.1186/s12911-019-0866-0
Sofronas, D., Margounakis, D., Rigou, M., Tambouris, E., & Pachidis, T. (2023). SQMetrics: An Educational Software Quality Assessment Tool for Java. Knowledge, 3(4), 557–599. https://doi.org/10.3390/knowledge3040036 DOI: https://doi.org/10.3390/knowledge3040036
Souza, D. M. d., Felizardo, K. R., & Barbosa, E. F. (2016). A Systematic Literature Review of Assessment Tools for Programming Assignments. https://doi.org/10.1109/cseet.2016.48 DOI: https://doi.org/10.1109/CSEET.2016.48
Thahseen, A., Aaron, N., Nanayakkara, T., Farves, A., Silva, D. I. D., & Gunathilake, P. (2023). Analyzing the Impact of Software Testing on Software Maintainability. https://doi.org/10.21203/rs.3.rs-2927364/v1 DOI: https://doi.org/10.21203/rs.3.rs-2927364/v1
Trudel, G. P., & Sambasivam, S. (2021). A Design Science Tool to Improve Code Maintainability for Hypertext Pre-Processor (PHP) Programs. 001. https://doi.org/10.28945/4769 DOI: https://doi.org/10.28945/4769
Tufano, R., Pascarella, L., Tufano, M., Poshyvanyk, D., & Bavota, G. (2021). Towards Automating Code Review Activities. 163–174. https://doi.org/10.1109/icse43902.2021.00027 DOI: https://doi.org/10.1109/ICSE43902.2021.00027
Tuma, K. (2021). Checking Security Compliance Between Models and Code. https://doi.org/10.48550/arxiv.2108.08579
Weisz, J. D., Müller, M., Houde, S., Richards, J. T., Ross, S., Martinez, F., Agarwal, M., & Talamadupula, K. (2021). Perfection Not Required? Human-Ai Partnerships in Code Translation. 402–412. https://doi.org/10.1145/3397481.3450656 DOI: https://doi.org/10.1145/3397481.3450656
Wessel, M., Serebrenik, A., Wiese, I., Steinmacher, I., & Gerosa, M. A. (2022). Quality Gatekeepers: Investigating the Effects of Code Review Bots on Pull Request Activities. Empirical Software Engineering, 27(5). https://doi.org/10.1007/s10664-022-10130-9 DOI: https://doi.org/10.1007/s10664-022-10130-9
Wibowo, A., Narmaditya, B. S., Widhiastuti, R., & Saptono, A. (2023). The Linkage Between Economic Literacy and Students’ Intention of Starting Business: The Mediating Role of Entrepreneurial Alertness. Journal of Entrepreneurship Management and Innovation, 19(1), 175–196. https://doi.org/10.7341/20231916 DOI: https://doi.org/10.7341/20231916
Zabardast, E., González‐Huerta, J., & Šmite, D. (2020). Refactoring, Bug Fixing, and New Development Effect on Technical Debt: An Industrial Case Study. 376–384. https://doi.org/10.1109/seaa51224.2020.00068 DOI: https://doi.org/10.1109/SEAA51224.2020.00068
Zhang, J., Maddila, C., Bairi, R., Bird, C., Raizada, U., Agrawal, A., Jhawar, Y., Herzig, K., & Deursen, A. v. (2022). Using Large-Scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft. https://doi.org/10.48550/arxiv.2202.02385 DOI: https://doi.org/10.1109/ICSE-SEIP58684.2023.00020
Zhu, F., Adomako, S., Donbesuur, F., Ahsan, M., Shinnar, R. S., & Sadeghi, A. (2024). Entrepreneurial Passion, Alertness and Opportunity Recognition: Affective-Cognitive Interactions in Dynamic Environments. International Small Business Journal Researching Entrepreneurship, 43(4), 358–390. https://doi.org/10.1177/02662426241298176 DOI: https://doi.org/10.1177/02662426241298176


