Evaluating Deep Learning Models for Humanitarian Sentiment Classification in Crisis Tweets: A Benchmark Study

Authors

  • Edi Junaedi Universitas Cendekia Abditama

DOI:

https://doi.org/10.61978/digitus.v3i4.975

Keywords:

Humanitarian NLP, Crisis Response, Tweet Classification, Sentiment Analysis, Transformer Models, Benchmark Datasets, Bias Mitigation

Abstract

Social media platforms have emerged as essential channels for real time crisis communication, offering valuable insights into public sentiment and humanitarian needs during emergencies. This study benchmarks the performance of state of the art deep learning models for classifying sentiment and humanitarian relevance in crisis related tweets. Using publicly available datasets CrisisMMD, HumAID, and CrisisBench we evaluate three architectures: IDBO CNN BiLSTM, BERTweet, and CrisisTransformers. These models were assessed using cross validation and standard performance metrics (accuracy, F1 score, precision, and recall). Results indicate that CrisisTransformers outperform both traditional CNN LSTM hybrids and general purpose transformers, achieving an accuracy of 0.861 and F1 score of 0.847. Domain specific pretraining significantly enhances contextual understanding, particularly in multilingual and ambiguous tweet scenarios. While transformer models offer superior classification capabilities, their computational complexity poses challenges for real time deployment. Additionally, operational risks, such as data bias and misinformation, necessitate careful management through structured human oversight and the integration of explainable AI mechanisms. This research provides a robust comparison of NLP models for crisis applications and recommends strategies for effective deployment, including bias mitigation and fairness aware learning. The findings contribute to building ethical and efficient NLP systems for humanitarian response.

References

Adams, C., Bozhidarova, M., Chen, J., Gao, A., Liu, Z., Priniski, J. H., Lin, J., Sonthalia, R., Bertozzi, A. L., & Brantingham, P. J. (2022). Knowledge Graphs of the QAnon Twitter Network. 2903–2912. https://doi.org/10.1109/bigdata55660.2022.10021128 DOI: https://doi.org/10.1109/BigData55660.2022.10021128

Almalki, J. (2022). A Machine Learning-Based Approach for Sentiment Analysis on Distance Learning From Arabic Tweets. Peerj Computer Science, 8, e1047. https://doi.org/10.7717/peerj-cs.1047 DOI: https://doi.org/10.7717/peerj-cs.1047

Alonso, M. Á., Vilares, D., Gómez‐Rodríguez, C., & Vilares, J. (2021). Sentiment Analysis for Fake News Detection. Electronics, 10(11), 1348. https://doi.org/10.3390/electronics10111348 DOI: https://doi.org/10.3390/electronics10111348

Alrawi, M. (2022). Is Teacher’s English Good Enough?: A Case Study of Saudi Teacher Spoken Language. Focus on Elt Journal, 63–77. https://doi.org/10.14744/felt.2022.4.3.5 DOI: https://doi.org/10.14744/felt.2022.4.3.5

Ardi, H., Hafizh, Muhd. A., Rezqi, I., & Tuzzikriah, R. (2022). Can Machine Translations Translate Humorous Texts? Humanus, 21(1), 99. https://doi.org/10.24036/humanus.v21i1.115698 DOI: https://doi.org/10.24036/humanus.v21i1.115698

Arun, K., & Srinagesh, A. (2020). Multilingual Twitter Sentiment Analysis Using Machine Learning. International Journal of Electrical and Computer Engineering (Ijece), 10(6), 5992. https://doi.org/10.11591/ijece.v10i6.pp5992-6000 DOI: https://doi.org/10.11591/ijece.v10i6.pp5992-6000

Banyongen, S. (2023). In the Eye of the Storm: Social Media and Crisis Management. https://doi.org/10.5772/intechopen.109449 DOI: https://doi.org/10.5772/intechopen.109449

Barbieri, F., Espinosa-Anke, L., & Camacho-Collados, J. (2021). XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. https://doi.org/10.48550/arxiv.2104.12250

Barkovska, O., Voropaieva, K., & Ruskikh, O. (2023). Justifying the Selection of a Neural Network Linguistic Classifier. Innovative Technologies and Scientific Solutions for Industries, 3(25), 5–14. https://doi.org/10.30837/itssi.2023.25.005 DOI: https://doi.org/10.30837/ITSSI.2023.25.005

Bukar, U. A., Jabar, M. A., Sidi, F., Nor, R. N. H., Abdullah, S., & Ghazali, A. H. A. (2021). Revisiting Social Media Crisis Communication Model for Building Resilience via Artificial Neural Network Analysis. International Journal of Academic Research in Business and Social Sciences, 11(17). https://doi.org/10.6007/ijarbss/v11-i17/11389 DOI: https://doi.org/10.6007/IJARBSS/v11-i17/11389

Dang, W., Cai, L., Liu, M., Li, X., Yin, Z., Liu, X., Yin, L., & Zheng, W. (2023). Increasing Text Filtering Accuracy With Improved LSTM. Computing and Informatics, 42(6), 1491–1517. https://doi.org/10.31577/cai_2023_6_1491 DOI: https://doi.org/10.31577/cai_2023_6_1491

Dideriksen, C., Christiansen, M. H., Dingemanse, M., Højmark‐Bertelsen, M., Johansson, C., Tylén, K., & Fusaroli, R. (2023). Language‐Specific Constraints on Conversation: Evidence From Danish and Norwegian. Cognitive Science, 47(11). https://doi.org/10.1111/cogs.13387 DOI: https://doi.org/10.1111/cogs.13387

Eriksson, M., & Olsson, E. (2016). Facebook and Twitter in Crisis Communication: A Comparative Study of Crisis Communication Professionals and Citizens. Journal of Contingencies and Crisis Management, 24(4), 198–208. https://doi.org/10.1111/1468-5973.12116 DOI: https://doi.org/10.1111/1468-5973.12116

Gong, T., Gao, H., Wang, Z., & Shuai, L. (2019). Perceptual Constraints on Colours Induce the Universality of Linguistic Colour Categorisation. Scientific Reports, 9(1). https://doi.org/10.1038/s41598-019-44202-6 DOI: https://doi.org/10.1038/s41598-019-44202-6

Haataja, M., Laajalahti, A., & Hyvärinen, J. (2016). Expert Views on Current and Future Use of Social Media Among Crisis and Emergency Management Organizations: Incentives and Barriers. Human Technology, 12(2), 135–164. https://doi.org/10.17011/ht/urn.201611174653 DOI: https://doi.org/10.17011/ht/urn.201611174653

Imran, M., Mitra, P., & Srivastava, J. (2016). Enabling Rapid Classification of Social Media Communications During Crises. International Journal of Information Systems for Crisis Response and Management, 8(3), 1–17. https://doi.org/10.4018/ijiscram.2016070101 DOI: https://doi.org/10.4018/IJISCRAM.2016070101

Imron, S., Setiawan, E. I., Santoso, J., & Purnomo, M. H. (2023). Aspect Based Sentiment Analysis Marketplace Product Reviews Using BERT, LSTM, and CNN. Jurnal Resti (Rekayasa Sistem Dan Teknologi Informasi), 7(3), 586–591. https://doi.org/10.29207/resti.v7i3.4751 DOI: https://doi.org/10.29207/resti.v7i3.4751

Jang, S. M., Infante, S., & Pour, A. A. (2020). Drug Dosing Considerations in Critically Ill Patients Receiving Continuous Renal Replacement Therapy. Pharmacy, 8(1), 18. https://doi.org/10.3390/pharmacy8010018 DOI: https://doi.org/10.3390/pharmacy8010018

Kumari, S. (2022). Text Mining and Pre-Processing Methods for Social Media Data Extraction and Processing. 22–53. https://doi.org/10.4018/978-1-7998-9594-7.ch002 DOI: https://doi.org/10.4018/978-1-7998-9594-7.ch002

Lee, N.-S., Hirschmeier, S., Müller, S., & Luz, L. J. (2017). Enablers in Crisis Information Management: A Literature Review. https://doi.org/10.24251/hicss.2017.033 DOI: https://doi.org/10.24251/HICSS.2017.033

Li, S., Zhang, Z., Tang, F., Cao, Q., Pan, H., & Lin, Z. (2023). Signal Process of Ultrasonic Guided Wave for Damage Detection of Localized Defects in Plates: From Shallow Learning to Deep Learning. Journal of Data Science and Intelligent Systems, 3(2), 149–164. https://doi.org/10.47852/bonviewjdsis32021771 DOI: https://doi.org/10.47852/bonviewJDSIS32021771

Li, Y., Wang, X., & Xu, P. (2018). Chinese Text Classification Model Based on Deep Learning. Future Internet, 10(11), 113. https://doi.org/10.3390/fi10110113 DOI: https://doi.org/10.3390/fi10110113

Luz, S., Haider, F., Sofia de la Fuente Garcia, Fromm, D., & MacWhinney, B. (2021). Detecting Cognitive Decline Using Speech Only: The ADReSSo Challenge. https://doi.org/10.21437/interspeech.2021-1220 DOI: https://doi.org/10.1101/2021.03.24.21254263

Maal, M., & Wilson-North, M. (2019). Social Media in Crisis Communication – The “Do’s” and “Don’ts.” International Journal of Disaster Resilience in the Built Environment, 10(5), 379–391. https://doi.org/10.1108/ijdrbe-06-2014-0044 DOI: https://doi.org/10.1108/IJDRBE-06-2014-0044

Montanari, S., Mayr, R., & Subrahmanyam, K. (2018). Bilingual Speech Sound Development During the Preschool Years: The Role of Language Proficiency and Cross-Linguistic Relatedness. Journal of Speech Language and Hearing Research, 61(10), 2467–2486. https://doi.org/10.1044/2018_jslhr-s-17-0393 DOI: https://doi.org/10.1044/2018_JSLHR-S-17-0393

Neekhara, P., Hussain, S., Dubnov, S., & Koushanfar, F. (2019). Adversarial Reprogramming of Text Classification Neural Networks. https://doi.org/10.18653/v1/d19-1525 DOI: https://doi.org/10.18653/v1/D19-1525

Ngai, C. S. B., & Yan, J. (2016). The Effectiveness of Crisis Communication Strategies on Sina Weibo in Relation to Chinese Publics’ Acceptance of These Strategies. Journal of Business and Technical Communication, 30(4), 451–494. https://doi.org/10.1177/1050651916651907 DOI: https://doi.org/10.1177/1050651916651907

Nugraha, F. A., Ekowati, T., Sumarsono, S., & Gayatri, S. (2023). Study on Food Security Among Farm Households Participating in the Sustainable Food Yard (Sfy) Program in Semarang City. Agric, 35(2), 237–250. https://doi.org/10.24246/agric.2023.v35.i2.p237-250 DOI: https://doi.org/10.24246/agric.2023.v35.i2.p237-250

Ramaswamy, S. L., & Jayakumar, C. (2023). Review on Positional Significance of LSTM and CNN in the Multilayer Deep Neural Architecture for Efficient Sentiment Classification. Journal of Intelligent & Fuzzy Systems, 45(4), 6077–6105. https://doi.org/10.3233/jifs-230917 DOI: https://doi.org/10.3233/JIFS-230917

Ranjan, R., & Daniel, A. K. (2023). CoBiAt: A Sentiment ClassificatiCobiat: A Sentiment Classification Model Using Hybrid Convnet- Dual-LSTM With Attention Mechanismon Model Using Hybrid ConvNet- Dual-LSTM With Attention Mechanism. Informatica, 47(4). https://doi.org/10.31449/inf.v47i4.3911 DOI: https://doi.org/10.31449/inf.v47i4.3911

Rawat, T., & Jain, S. (2022). DPre: Effective Preprocessing Techniques for Social Media Depressive Text. Intelligent Decision Technologies, 16(3), 475–485. https://doi.org/10.3233/idt-210199 DOI: https://doi.org/10.3233/IDT-210199

Robbeets, M., Bouckaert, R., Conte, M., Savelyev, A., Li, T., An, D.-I., Shinoda, K., Cui, Y., Kawashima, T., Kim, G., Uchiyama, J., Dolińska, J., Oskolskaya, S., Yamano, K.-Y., Seguchi, N., Tomita, H., Takamiya, H., Kanzawa‐Kiriyama, H., Oota, H., … Ning, C. (2021). Triangulation Supports Agricultural Spread of the Transeurasian Languages. Nature, 599(7886), 616–621. https://doi.org/10.1038/s41586-021-04108-8 DOI: https://doi.org/10.1038/s41586-021-04108-8

Salur, M. U., & Aydın, İ. (2020). A Novel Hybrid Deep Learning Model for Sentiment Classification. Ieee Access, 8, 58080–58093. https://doi.org/10.1109/access.2020.2982538 DOI: https://doi.org/10.1109/ACCESS.2020.2982538

Shao, J., Wang, Q., & Liu, F. (2019). Learning to Sample: An Active Learning Framework. 538–547. https://doi.org/10.1109/icdm.2019.00064 DOI: https://doi.org/10.1109/ICDM.2019.00064

Sosamphan, P., Liesaputra, V., Yongchareon, S., & Mohaghegh, M. (2016). Evaluation of Statistical Text Normalisation Techniques for Twitter. https://doi.org/10.5220/0006083004130418 DOI: https://doi.org/10.5220/0006083004130418

Taware, R., Varat, S., Salunke, G., Gawande, C., Kale, G., Khengare, R., & Joshi, R. (2022). ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models. 235–249. https://doi.org/10.1007/978-3-030-95467-3_18 DOI: https://doi.org/10.1007/978-3-030-95467-3_18

Tian, X., He, W., & Wang, F. (2021). Applying Sentiment Analytics to Examine Social Media Crises: A Case Study of United Airline’s Crisis in 2017. Data Technologies and Applications, 56(1), 1–23. https://doi.org/10.1108/dta-09-2018-0087 DOI: https://doi.org/10.1108/DTA-09-2018-0087

Velankar, A., Patil, H., Gore, A., Salunke, S., & Joshi, R. (2021). Hate and Offensive Speech Detection in Hindi and Marathi. https://doi.org/10.48550/arxiv.2110.12200

Yang, S., Ward, N. A., & Hayden, E. M. (2023). Her Chinese Name Means Beautiful: Culture, Care and Naming Practices. Journal for Multicultural Education, 17(3), 291–303. https://doi.org/10.1108/jme-11-2022-0159 DOI: https://doi.org/10.1108/JME-11-2022-0159

Zhang, W., Hu, L., & Park, J. (2022). Politics Go “Viral”: A Computational Text Analysis of the Public Attribution and Attitude Regarding the COVID-19 Crisis and Governmental Responses on Twitter. Social Science Computer Review, 41(3), 790–811. https://doi.org/10.1177/08944393211053743 DOI: https://doi.org/10.1177/08944393211053743

Zhou, G., Gul, R., & Tufail, M. (2022). Does Servant Leadership Stimulate Work Engagement? The Moderating Role of Trust in the Leader. Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.925732 DOI: https://doi.org/10.3389/fpsyg.2022.925732

Downloads

Published

2025-10-06

How to Cite

Junaedi, E. (2025). Evaluating Deep Learning Models for Humanitarian Sentiment Classification in Crisis Tweets: A Benchmark Study. Digitus : Journal of Computer Science Applications, 3(4), 237–248. https://doi.org/10.61978/digitus.v3i4.975