Enhancing Software Quality Through Automated Code Review Tools: An Empirical Synthesis Across CI/CD Pipelines

Authors

  • Budi Gunawan Universitas Jayabaya
  • Anwar T Sitorus STMIK Mercusuar

DOI:

https://doi.org/10.61978/digitus.v3i4.956

Keywords:

Automated Code Review, Software Quality, Static Analysis, CI/CD, Developer Productivity, Copilot Autofix, SonarQube

Abstract

Automated Code Review Tools (ACRT) have become increasingly integral to modern software development workflows, particularly within continuous integration and deployment (CI/CD) environments. This study aims to evaluate the effectiveness of ACRT in improving software quality, accelerating vulnerability remediation, and enhancing developer productivity. Using a combination of empirical analysis, industry case studies, and academic benchmarks, we examine how tools such as SonarQube, CodeQL, Copilot Autofix, and secret scanners impact key quality metrics including defect density, Mean Time to Repair (MTTR), and pull request (PR) throughput. A quasi experimental design was employed using Interrupted Time Series (ITS) and Regression Discontinuity Design (RDD) to measure longitudinal outcomes across six open source and enterprise projects. Results indicate that defect density decreased by 15–30% following ACRT adoption, accompanied by notable improvements in security MTTR. For example, Copilot Autofix reduced XSS remediation times from 180 minutes to just 22 minutes, underscoring the tool’s potential for accelerating vulnerability management. PR throughput also increased by up to 40%. However, this efficiency gain coincided with a 20–30% decline in human code review interactions, highlighting a trade-off between automation benefits and the reduced depth of manual oversight. We conclude that ACRT tools, when integrated thoughtfully into development pipelines, can deliver measurable improvements in software quality and responsiveness. However, sustained benefits require careful tuning, contextual alerting, and a hybrid review strategy that maintains human involvement to preserve long term maintainability.

References

Alarcon, G. M., Walter, C., Gibson, A., Gamble, R., Capiola, A., Jessup, S. A., & Ryan, T. J. (2020). Would You Fix This Code for Me? Effects of Repair Source and Commenting on Trust in Code Repair. Systems, 8(1), 8. https://doi.org/10.3390/systems8010008 DOI: https://doi.org/10.3390/systems8010008

Alvarez, M. J., & Miller, M. K. (2016). Counterfactual Thinking About Crime Control Theater: Mock Jurors’ Decision Making in an AMBER Alert Trial. Psychology Public Policy and Law, 22(4), 349–361. https://doi.org/10.1037/law0000098 DOI: https://doi.org/10.1037/law0000098

Belachew, E. B. (2018). Analysis of Software Quality Using Software Metrics. International Journal on Computational Science & Applications, 8(4/5), 11–20. https://doi.org/10.5121/ijcsa.2018.8502 DOI: https://doi.org/10.5121/ijcsa.2018.8502

Bertram, I., Hong, J., Huang, Y., Weimer, W., & Sharafi, Z. (2020). Trustworthiness Perceptions in Code Review. 1–6. https://doi.org/10.1145/3382494.3422164 DOI: https://doi.org/10.1145/3382494.3422164

Biase, M. d., Bruntink, M., & Bacchelli, A. (2016). A Security Perspective on Code Review: The Case of Chromium. https://doi.org/10.1109/scam.2016.30 DOI: https://doi.org/10.1109/SCAM.2016.30

Bitkina, O. V., & Park, J. (2021). Emotional State and Social Media Experience: A Pandemic Case Study. Sustainability, 13(23), 13311. https://doi.org/10.3390/su132313311 DOI: https://doi.org/10.3390/su132313311

Eisty, N. U. (2021). Developers Perception of Peer Code Review in Research Software Development. https://doi.org/10.48550/arxiv.2109.10971 DOI: https://doi.org/10.1007/s10664-021-10053-x

Erlenhov, L., Francisco Gomes de Oliveira Neto, & Leitner, P. (2020). An Empirical Study of Bots in Software Development: Characteristics and Challenges From a Practitioner’s Perspective. 445–455. https://doi.org/10.1145/3368089.3409680 DOI: https://doi.org/10.1145/3368089.3409680

Ford, D., Behroozi, M., Serebrenik, A., & Parnin, C. (2019). Beyond the Code Itself: How Programmers Really Look at Pull Requests. https://doi.org/10.1109/icse-seis.2019.00014 DOI: https://doi.org/10.1109/ICSE-SEIS.2019.00014

Fregnan, E., Petrulio, F., & Bacchelli, A. (2022). The Evolution of the Code During Review: An Investigation on Review Changes. Empirical Software Engineering, 27(7). https://doi.org/10.1007/s10664-022-10205-7 DOI: https://doi.org/10.1007/s10664-022-10205-7

Han, D., Ragkhitwetsagul, C., Krinke, J., Paixão, M., & Rosa, G. (2020). Does Code Review Really Remove Coding Convention Violations? 43–53. https://doi.org/10.1109/scam51674.2020.00010 DOI: https://doi.org/10.1109/SCAM51674.2020.00010

Iftikhar, U., Börstler, J., & Ali, N. b. (2023). On Potential Improvements in the Analysis of the Evolution of Themes in Code Review Comments. 340–347. https://doi.org/10.1109/seaa60479.2023.00059 DOI: https://doi.org/10.1109/SEAA60479.2023.00059

Knutas, A., Hynninen, T., & Hujala, M. (2021). To Get Good Student Ratings Should You Only Teach Programming Courses? Investigation and Implications of Student Evaluations of Teaching in a Software Engineering Context. 253–260. https://doi.org/10.1109/icse-seet52601.2021.00035 DOI: https://doi.org/10.1109/ICSE-SEET52601.2021.00035

Kovalenko, V., Tintarev, N., Pasynkov, E., Bird, C., & Bacchelli, A. (2020). Does Reviewer Recommendation Help Developers? Ieee Transactions on Software Engineering, 46(7), 710–731. https://doi.org/10.1109/tse.2018.2868367 DOI: https://doi.org/10.1109/TSE.2018.2868367

Long, J., Sampson, F., Coster, J., O’Hara, R., Bell, F., & Goodacre, S. (2024). How Do Emergency Departments Respond to Ambulance Pre-Alert Calls? A Qualitative Exploration of the Management of Pre-Alerts in UK Emergency Departments. Emergency Medicine Journal, 42(1), 28–34. https://doi.org/10.1136/emermed-2023-213854 DOI: https://doi.org/10.1136/emermed-2023-213854

Malloy, B. A., & Power, J. F. (2017). Quantifying the Transition From Python 2 to 3: An Empirical Study of Python Applications. https://doi.org/10.1109/esem.2017.45 DOI: https://doi.org/10.1109/ESEM.2017.45

Marginean, A., Bader, J., Chandra, S., Harman, M., Jia, Y., Mao, K., Mols, A., & Scott, A. (2019). SapFix: Automated End-to-End Repair at Scale. https://doi.org/10.1109/icse-seip.2019.00039 DOI: https://doi.org/10.1109/ICSE-SEIP.2019.00039

Melo, M. S., Menezes, G., & Cafeo, B. (2022). Exploring Pull Requests in Code Samples. https://doi.org/10.5753/vem.2022.226789 DOI: https://doi.org/10.5753/vem.2022.226789

Morris, M. E., Brusco, N. K., Jones, J., Taylor, N. F., East, C., Semciw, A. I., Edvardsson, K., Thwaites, C., Bourke, S. L., Khan, U. R., Fowler‐Davis, S., & Oldenburg, B. (2023). The Widening Gap Between the Digital Capability of the Care Workforce and Technology-Enabled Healthcare Delivery: A Nursing and Allied Health Analysis. Healthcare, 11(7), 994. https://doi.org/10.3390/healthcare11070994 DOI: https://doi.org/10.3390/healthcare11070994

Panichella, S., Panichella, A., Beller, M., Zaidman, A., & Gall, H. (2016). The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation. https://doi.org/10.7287/peerj.preprints.1467v3 DOI: https://doi.org/10.7287/peerj.preprints.1467v3

Pérez‐Castillo, R., & Piattini, M. (2018). An Empirical Study on How Project Context Impacts on Code Cloning. Journal of Software Evolution and Process, 30(12). https://doi.org/10.1002/smr.2115 DOI: https://doi.org/10.1002/smr.2115

Quintens, C., Rijdt, T. D., Nieuwenhuyse, T. V., Simoens, S., Peetermans, W., Bosch, B. V. d., Casteels, M., & Spriet, I. (2019). Development and Implementation of “Check of Medication Appropriateness” (CMA): Advanced Pharmacotherapy-Related Clinical Rules to Support Medication Surveillance. BMC Medical Informatics and Decision Making, 19(1). https://doi.org/10.1186/s12911-019-0748-5 DOI: https://doi.org/10.1186/s12911-019-0748-5

Rahman, S. u., Khan, M. A., & Iqbal, N. (2018). Motivations and Barriers to Purchasing Online: Understanding Consumer Responses. South Asian Journal of Business Studies, 7(1), 111–128. https://doi.org/10.1108/sajbs-11-2016-0088 DOI: https://doi.org/10.1108/SAJBS-11-2016-0088

Sharma, L. (2019). A Systematic Review of the Concept of Entrepreneurial Alertness. Journal of Entrepreneurship in Emerging Economies, 11(2), 217–233. https://doi.org/10.1108/jeee-05-2018-0049 DOI: https://doi.org/10.1108/JEEE-05-2018-0049

Shi, S., Li, M., Lo, D., Thung, F., & Huo, X. (2019). Automatic Code Review by Learning the Revision of Source Code. Proceedings of the Aaai Conference on Artificial Intelligence, 33(01), 4910–4917. https://doi.org/10.1609/aaai.v33i01.33014910 DOI: https://doi.org/10.1609/aaai.v33i01.33014910

Snyder, M. E., Jaynes, H. A., Gernant, S. A., Diiulio, J., Militello, L. G., Doucette, W. R., Adeoye‐Olatunde, O. A., & Russ, A. L. (2019). Alerts for Community Pharmacist-Provided Medication Therapy Management: Recommendations From a Heuristic Evaluation. BMC Medical Informatics and Decision Making, 19(1). https://doi.org/10.1186/s12911-019-0866-0 DOI: https://doi.org/10.1186/s12911-019-0866-0

Sofronas, D., Margounakis, D., Rigou, M., Tambouris, E., & Pachidis, T. (2023). SQMetrics: An Educational Software Quality Assessment Tool for Java. Knowledge, 3(4), 557–599. https://doi.org/10.3390/knowledge3040036 DOI: https://doi.org/10.3390/knowledge3040036

Souza, D. M. d., Felizardo, K. R., & Barbosa, E. F. (2016). A Systematic Literature Review of Assessment Tools for Programming Assignments. https://doi.org/10.1109/cseet.2016.48 DOI: https://doi.org/10.1109/CSEET.2016.48

Thahseen, A., Aaron, N., Nanayakkara, T., Farves, A., Silva, D. I. D., & Gunathilake, P. (2023). Analyzing the Impact of Software Testing on Software Maintainability. https://doi.org/10.21203/rs.3.rs-2927364/v1 DOI: https://doi.org/10.21203/rs.3.rs-2927364/v1

Trudel, G. P., & Sambasivam, S. (2021). A Design Science Tool to Improve Code Maintainability for Hypertext Pre-Processor (PHP) Programs. 001. https://doi.org/10.28945/4769 DOI: https://doi.org/10.28945/4769

Tufano, R., Pascarella, L., Tufano, M., Poshyvanyk, D., & Bavota, G. (2021). Towards Automating Code Review Activities. 163–174. https://doi.org/10.1109/icse43902.2021.00027 DOI: https://doi.org/10.1109/ICSE43902.2021.00027

Tuma, K. (2021). Checking Security Compliance Between Models and Code. https://doi.org/10.48550/arxiv.2108.08579

Weisz, J. D., Müller, M., Houde, S., Richards, J. T., Ross, S., Martinez, F., Agarwal, M., & Talamadupula, K. (2021). Perfection Not Required? Human-Ai Partnerships in Code Translation. 402–412. https://doi.org/10.1145/3397481.3450656 DOI: https://doi.org/10.1145/3397481.3450656

Wessel, M., Serebrenik, A., Wiese, I., Steinmacher, I., & Gerosa, M. A. (2022). Quality Gatekeepers: Investigating the Effects of Code Review Bots on Pull Request Activities. Empirical Software Engineering, 27(5). https://doi.org/10.1007/s10664-022-10130-9 DOI: https://doi.org/10.1007/s10664-022-10130-9

Wibowo, A., Narmaditya, B. S., Widhiastuti, R., & Saptono, A. (2023). The Linkage Between Economic Literacy and Students’ Intention of Starting Business: The Mediating Role of Entrepreneurial Alertness. Journal of Entrepreneurship Management and Innovation, 19(1), 175–196. https://doi.org/10.7341/20231916 DOI: https://doi.org/10.7341/20231916

Zabardast, E., González‐Huerta, J., & Šmite, D. (2020). Refactoring, Bug Fixing, and New Development Effect on Technical Debt: An Industrial Case Study. 376–384. https://doi.org/10.1109/seaa51224.2020.00068 DOI: https://doi.org/10.1109/SEAA51224.2020.00068

Zhang, J., Maddila, C., Bairi, R., Bird, C., Raizada, U., Agrawal, A., Jhawar, Y., Herzig, K., & Deursen, A. v. (2022). Using Large-Scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft. https://doi.org/10.48550/arxiv.2202.02385 DOI: https://doi.org/10.1109/ICSE-SEIP58684.2023.00020

Zhu, F., Adomako, S., Donbesuur, F., Ahsan, M., Shinnar, R. S., & Sadeghi, A. (2024). Entrepreneurial Passion, Alertness and Opportunity Recognition: Affective-Cognitive Interactions in Dynamic Environments. International Small Business Journal Researching Entrepreneurship, 43(4), 358–390. https://doi.org/10.1177/02662426241298176 DOI: https://doi.org/10.1177/02662426241298176

Downloads

Published

2025-10-06

How to Cite

Gunawan, B., & Sitorus, A. T. (2025). Enhancing Software Quality Through Automated Code Review Tools: An Empirical Synthesis Across CI/CD Pipelines. Digitus : Journal of Computer Science Applications, 3(4), 214–225. https://doi.org/10.61978/digitus.v3i4.956