Evaluating Deep Learning Models for Humanitarian Sentiment Classification in Crisis Tweets: A Benchmark Study
DOI:
https://doi.org/10.61978/digitus.v3i4.975Keywords:
Humanitarian NLP, Crisis Response, Tweet Classification, Sentiment Analysis, Transformer Models, Benchmark Datasets, Bias MitigationAbstract
Social media platforms have emerged as essential channels for real time crisis communication, offering valuable insights into public sentiment and humanitarian needs during emergencies. This study benchmarks the performance of state of the art deep learning models for classifying sentiment and humanitarian relevance in crisis related tweets. Using publicly available datasets CrisisMMD, HumAID, and CrisisBench we evaluate three architectures: IDBO CNN BiLSTM, BERTweet, and CrisisTransformers. These models were assessed using cross validation and standard performance metrics (accuracy, F1 score, precision, and recall). Results indicate that CrisisTransformers outperform both traditional CNN LSTM hybrids and general purpose transformers, achieving an accuracy of 0.861 and F1 score of 0.847. Domain specific pretraining significantly enhances contextual understanding, particularly in multilingual and ambiguous tweet scenarios. While transformer models offer superior classification capabilities, their computational complexity poses challenges for real time deployment. Additionally, operational risks, such as data bias and misinformation, necessitate careful management through structured human oversight and the integration of explainable AI mechanisms. This research provides a robust comparison of NLP models for crisis applications and recommends strategies for effective deployment, including bias mitigation and fairness aware learning. The findings contribute to building ethical and efficient NLP systems for humanitarian response.
References
Adams, C., Bozhidarova, M., Chen, J., Gao, A., Liu, Z., Priniski, J. H., Lin, J., Sonthalia, R., Bertozzi, A. L., & Brantingham, P. J. (2022). Knowledge Graphs of the QAnon Twitter Network. 2903–2912. https://doi.org/10.1109/bigdata55660.2022.10021128 DOI: https://doi.org/10.1109/BigData55660.2022.10021128
Almalki, J. (2022). A Machine Learning-Based Approach for Sentiment Analysis on Distance Learning From Arabic Tweets. Peerj Computer Science, 8, e1047. https://doi.org/10.7717/peerj-cs.1047 DOI: https://doi.org/10.7717/peerj-cs.1047
Alonso, M. Á., Vilares, D., Gómez‐Rodríguez, C., & Vilares, J. (2021). Sentiment Analysis for Fake News Detection. Electronics, 10(11), 1348. https://doi.org/10.3390/electronics10111348 DOI: https://doi.org/10.3390/electronics10111348
Alrawi, M. (2022). Is Teacher’s English Good Enough?: A Case Study of Saudi Teacher Spoken Language. Focus on Elt Journal, 63–77. https://doi.org/10.14744/felt.2022.4.3.5 DOI: https://doi.org/10.14744/felt.2022.4.3.5
Ardi, H., Hafizh, Muhd. A., Rezqi, I., & Tuzzikriah, R. (2022). Can Machine Translations Translate Humorous Texts? Humanus, 21(1), 99. https://doi.org/10.24036/humanus.v21i1.115698 DOI: https://doi.org/10.24036/humanus.v21i1.115698
Arun, K., & Srinagesh, A. (2020). Multilingual Twitter Sentiment Analysis Using Machine Learning. International Journal of Electrical and Computer Engineering (Ijece), 10(6), 5992. https://doi.org/10.11591/ijece.v10i6.pp5992-6000 DOI: https://doi.org/10.11591/ijece.v10i6.pp5992-6000
Banyongen, S. (2023). In the Eye of the Storm: Social Media and Crisis Management. https://doi.org/10.5772/intechopen.109449 DOI: https://doi.org/10.5772/intechopen.109449
Barbieri, F., Espinosa-Anke, L., & Camacho-Collados, J. (2021). XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. https://doi.org/10.48550/arxiv.2104.12250
Barkovska, O., Voropaieva, K., & Ruskikh, O. (2023). Justifying the Selection of a Neural Network Linguistic Classifier. Innovative Technologies and Scientific Solutions for Industries, 3(25), 5–14. https://doi.org/10.30837/itssi.2023.25.005 DOI: https://doi.org/10.30837/ITSSI.2023.25.005
Bukar, U. A., Jabar, M. A., Sidi, F., Nor, R. N. H., Abdullah, S., & Ghazali, A. H. A. (2021). Revisiting Social Media Crisis Communication Model for Building Resilience via Artificial Neural Network Analysis. International Journal of Academic Research in Business and Social Sciences, 11(17). https://doi.org/10.6007/ijarbss/v11-i17/11389 DOI: https://doi.org/10.6007/IJARBSS/v11-i17/11389
Dang, W., Cai, L., Liu, M., Li, X., Yin, Z., Liu, X., Yin, L., & Zheng, W. (2023). Increasing Text Filtering Accuracy With Improved LSTM. Computing and Informatics, 42(6), 1491–1517. https://doi.org/10.31577/cai_2023_6_1491 DOI: https://doi.org/10.31577/cai_2023_6_1491
Dideriksen, C., Christiansen, M. H., Dingemanse, M., Højmark‐Bertelsen, M., Johansson, C., Tylén, K., & Fusaroli, R. (2023). Language‐Specific Constraints on Conversation: Evidence From Danish and Norwegian. Cognitive Science, 47(11). https://doi.org/10.1111/cogs.13387 DOI: https://doi.org/10.1111/cogs.13387
Eriksson, M., & Olsson, E. (2016). Facebook and Twitter in Crisis Communication: A Comparative Study of Crisis Communication Professionals and Citizens. Journal of Contingencies and Crisis Management, 24(4), 198–208. https://doi.org/10.1111/1468-5973.12116 DOI: https://doi.org/10.1111/1468-5973.12116
Gong, T., Gao, H., Wang, Z., & Shuai, L. (2019). Perceptual Constraints on Colours Induce the Universality of Linguistic Colour Categorisation. Scientific Reports, 9(1). https://doi.org/10.1038/s41598-019-44202-6 DOI: https://doi.org/10.1038/s41598-019-44202-6
Haataja, M., Laajalahti, A., & Hyvärinen, J. (2016). Expert Views on Current and Future Use of Social Media Among Crisis and Emergency Management Organizations: Incentives and Barriers. Human Technology, 12(2), 135–164. https://doi.org/10.17011/ht/urn.201611174653 DOI: https://doi.org/10.17011/ht/urn.201611174653
Imran, M., Mitra, P., & Srivastava, J. (2016). Enabling Rapid Classification of Social Media Communications During Crises. International Journal of Information Systems for Crisis Response and Management, 8(3), 1–17. https://doi.org/10.4018/ijiscram.2016070101 DOI: https://doi.org/10.4018/IJISCRAM.2016070101
Imron, S., Setiawan, E. I., Santoso, J., & Purnomo, M. H. (2023). Aspect Based Sentiment Analysis Marketplace Product Reviews Using BERT, LSTM, and CNN. Jurnal Resti (Rekayasa Sistem Dan Teknologi Informasi), 7(3), 586–591. https://doi.org/10.29207/resti.v7i3.4751 DOI: https://doi.org/10.29207/resti.v7i3.4751
Jang, S. M., Infante, S., & Pour, A. A. (2020). Drug Dosing Considerations in Critically Ill Patients Receiving Continuous Renal Replacement Therapy. Pharmacy, 8(1), 18. https://doi.org/10.3390/pharmacy8010018 DOI: https://doi.org/10.3390/pharmacy8010018
Kumari, S. (2022). Text Mining and Pre-Processing Methods for Social Media Data Extraction and Processing. 22–53. https://doi.org/10.4018/978-1-7998-9594-7.ch002 DOI: https://doi.org/10.4018/978-1-7998-9594-7.ch002
Lee, N.-S., Hirschmeier, S., Müller, S., & Luz, L. J. (2017). Enablers in Crisis Information Management: A Literature Review. https://doi.org/10.24251/hicss.2017.033 DOI: https://doi.org/10.24251/HICSS.2017.033
Li, S., Zhang, Z., Tang, F., Cao, Q., Pan, H., & Lin, Z. (2023). Signal Process of Ultrasonic Guided Wave for Damage Detection of Localized Defects in Plates: From Shallow Learning to Deep Learning. Journal of Data Science and Intelligent Systems, 3(2), 149–164. https://doi.org/10.47852/bonviewjdsis32021771 DOI: https://doi.org/10.47852/bonviewJDSIS32021771
Li, Y., Wang, X., & Xu, P. (2018). Chinese Text Classification Model Based on Deep Learning. Future Internet, 10(11), 113. https://doi.org/10.3390/fi10110113 DOI: https://doi.org/10.3390/fi10110113
Luz, S., Haider, F., Sofia de la Fuente Garcia, Fromm, D., & MacWhinney, B. (2021). Detecting Cognitive Decline Using Speech Only: The ADReSSo Challenge. https://doi.org/10.21437/interspeech.2021-1220 DOI: https://doi.org/10.1101/2021.03.24.21254263
Maal, M., & Wilson-North, M. (2019). Social Media in Crisis Communication – The “Do’s” and “Don’ts.” International Journal of Disaster Resilience in the Built Environment, 10(5), 379–391. https://doi.org/10.1108/ijdrbe-06-2014-0044 DOI: https://doi.org/10.1108/IJDRBE-06-2014-0044
Montanari, S., Mayr, R., & Subrahmanyam, K. (2018). Bilingual Speech Sound Development During the Preschool Years: The Role of Language Proficiency and Cross-Linguistic Relatedness. Journal of Speech Language and Hearing Research, 61(10), 2467–2486. https://doi.org/10.1044/2018_jslhr-s-17-0393 DOI: https://doi.org/10.1044/2018_JSLHR-S-17-0393
Neekhara, P., Hussain, S., Dubnov, S., & Koushanfar, F. (2019). Adversarial Reprogramming of Text Classification Neural Networks. https://doi.org/10.18653/v1/d19-1525 DOI: https://doi.org/10.18653/v1/D19-1525
Ngai, C. S. B., & Yan, J. (2016). The Effectiveness of Crisis Communication Strategies on Sina Weibo in Relation to Chinese Publics’ Acceptance of These Strategies. Journal of Business and Technical Communication, 30(4), 451–494. https://doi.org/10.1177/1050651916651907 DOI: https://doi.org/10.1177/1050651916651907
Nugraha, F. A., Ekowati, T., Sumarsono, S., & Gayatri, S. (2023). Study on Food Security Among Farm Households Participating in the Sustainable Food Yard (Sfy) Program in Semarang City. Agric, 35(2), 237–250. https://doi.org/10.24246/agric.2023.v35.i2.p237-250 DOI: https://doi.org/10.24246/agric.2023.v35.i2.p237-250
Ramaswamy, S. L., & Jayakumar, C. (2023). Review on Positional Significance of LSTM and CNN in the Multilayer Deep Neural Architecture for Efficient Sentiment Classification. Journal of Intelligent & Fuzzy Systems, 45(4), 6077–6105. https://doi.org/10.3233/jifs-230917 DOI: https://doi.org/10.3233/JIFS-230917
Ranjan, R., & Daniel, A. K. (2023). CoBiAt: A Sentiment ClassificatiCobiat: A Sentiment Classification Model Using Hybrid Convnet- Dual-LSTM With Attention Mechanismon Model Using Hybrid ConvNet- Dual-LSTM With Attention Mechanism. Informatica, 47(4). https://doi.org/10.31449/inf.v47i4.3911 DOI: https://doi.org/10.31449/inf.v47i4.3911
Rawat, T., & Jain, S. (2022). DPre: Effective Preprocessing Techniques for Social Media Depressive Text. Intelligent Decision Technologies, 16(3), 475–485. https://doi.org/10.3233/idt-210199 DOI: https://doi.org/10.3233/IDT-210199
Robbeets, M., Bouckaert, R., Conte, M., Savelyev, A., Li, T., An, D.-I., Shinoda, K., Cui, Y., Kawashima, T., Kim, G., Uchiyama, J., Dolińska, J., Oskolskaya, S., Yamano, K.-Y., Seguchi, N., Tomita, H., Takamiya, H., Kanzawa‐Kiriyama, H., Oota, H., … Ning, C. (2021). Triangulation Supports Agricultural Spread of the Transeurasian Languages. Nature, 599(7886), 616–621. https://doi.org/10.1038/s41586-021-04108-8 DOI: https://doi.org/10.1038/s41586-021-04108-8
Salur, M. U., & Aydın, İ. (2020). A Novel Hybrid Deep Learning Model for Sentiment Classification. Ieee Access, 8, 58080–58093. https://doi.org/10.1109/access.2020.2982538 DOI: https://doi.org/10.1109/ACCESS.2020.2982538
Shao, J., Wang, Q., & Liu, F. (2019). Learning to Sample: An Active Learning Framework. 538–547. https://doi.org/10.1109/icdm.2019.00064 DOI: https://doi.org/10.1109/ICDM.2019.00064
Sosamphan, P., Liesaputra, V., Yongchareon, S., & Mohaghegh, M. (2016). Evaluation of Statistical Text Normalisation Techniques for Twitter. https://doi.org/10.5220/0006083004130418 DOI: https://doi.org/10.5220/0006083004130418
Taware, R., Varat, S., Salunke, G., Gawande, C., Kale, G., Khengare, R., & Joshi, R. (2022). ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models. 235–249. https://doi.org/10.1007/978-3-030-95467-3_18 DOI: https://doi.org/10.1007/978-3-030-95467-3_18
Tian, X., He, W., & Wang, F. (2021). Applying Sentiment Analytics to Examine Social Media Crises: A Case Study of United Airline’s Crisis in 2017. Data Technologies and Applications, 56(1), 1–23. https://doi.org/10.1108/dta-09-2018-0087 DOI: https://doi.org/10.1108/DTA-09-2018-0087
Velankar, A., Patil, H., Gore, A., Salunke, S., & Joshi, R. (2021). Hate and Offensive Speech Detection in Hindi and Marathi. https://doi.org/10.48550/arxiv.2110.12200
Yang, S., Ward, N. A., & Hayden, E. M. (2023). Her Chinese Name Means Beautiful: Culture, Care and Naming Practices. Journal for Multicultural Education, 17(3), 291–303. https://doi.org/10.1108/jme-11-2022-0159 DOI: https://doi.org/10.1108/JME-11-2022-0159
Zhang, W., Hu, L., & Park, J. (2022). Politics Go “Viral”: A Computational Text Analysis of the Public Attribution and Attitude Regarding the COVID-19 Crisis and Governmental Responses on Twitter. Social Science Computer Review, 41(3), 790–811. https://doi.org/10.1177/08944393211053743 DOI: https://doi.org/10.1177/08944393211053743
Zhou, G., Gul, R., & Tufail, M. (2022). Does Servant Leadership Stimulate Work Engagement? The Moderating Role of Trust in the Leader. Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.925732 DOI: https://doi.org/10.3389/fpsyg.2022.925732


