Scientific Journal Of King Faisal University: Basic and Applied Sciences
Scientific Journal of King Faisal University: Basic and Applied Science
Using Machine Learning to Analyze Emotions in Arabic and Dialectical Texts
(Dina Abdelnaser Hamed , Ben Bella Said Tawfik and Mohamed Abdullah Makhlouf )Abstract
Social media is an imperative necessity in contemporary life. People can easily express their emotions and share moments on social media by writing a few words. Organizations approach Twitter as a rich data source that may be used to study emotions, but while many efforts have focused on sentiment analysis from text, emotion classification has received less attention. Emotion analysis usually provides a more in-depth assessment of the author's feelings, and in this research, we propose a dialectal Arabic text emotion classification architecture that accurately classifies the expressions into four emotions (anger, joy, fear, and sadness). Considering the improvements in natural language processing (NLP), we investigated the Bidirectional encoder representations from transformers (BERT) model. We implemented our proposed ensemble model via a majority voting technique that merges the best three versions of the pre-trained BERT models that are considered state-of-the-art in the classification field. We compared the results of our model with eight other machine learning classifiers and ten versions of the BERT model. The proposed ensemble approach accomplished around 84%, however the highest accuracy of the other investigated models was 76%. The presented experiments were examined on the Arabic tweets’ dataset for the EI-OC task provided by SemiEval, which contains 5600 tweets.
KEYWORDS
text classification, fine-tuning, voting technique, naive bayes, augmentation, transformers
PDF
References
Abdullah, M. and Shaikh, S. (2018). Teamuncc at SemEval-2018 Task 1: Emotion detection in English and Arabic tweets using deep learning. In Proceedings of the 12th International Workshop on Semantic Evaluation, n/a(n/a), 350–7. DOI: 10.18653/v1/S18-1
Abdullah, M., AlMasawa, M., Makki, I., Alsolmi, M. and Mahrous, S. (2020). Emotions extraction from Arabic tweets. International Journal of Computers and Applications, 42(7), 661–75. DOI: 10.1080/1206212X.2018.1482395
Alammary, A.S. (2022). BERT models for Arabic text classification: A systematic review. Applied Sciences, 12(11), 5720. DOI: 10.3390/app12115720
Alswaidan, N. and Menai, M.E.B. (2020). Hybrid feature model for emotion recognition in Arabic text. IEEE Access, 8(n/a), 37843–54. DOI: 10.1109/ACCESS.2020.2975906
Alzanin, S.M., Azmi, A.M. and Aboalsamh, H.A. (2022). Short text classification for Arabic social media tweets. Journal of King Saud University-Computer and Information Sciences, 34(9), 6595–604. DOI: 10.1016/j.jksuci.2022.03.020
Antoun, W., Baly, F. and Hajj, H. (2020). Arabert: Transformer-based model for arabic language understanding. ArXiv Preprint ArXiv:2003.00104. n/a(n/a), 9–15. DOI: 10.48550/arXiv.2003.00104
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. DOI: 10.1023/A:1010933404324
Bullinaria, J.A. (2013). Recurrent neural networks. Neural Computation: Lecture, 12(n/a), 1–20.
Dai, B., Li, J. and Xu, R. (2020). Multiple positional self-attention network for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, 34(5), 7610–7. DOI: 10.1609/aaai.v34i05.6261
Daood, A., Salman, I. and Ghneim, N. (2017). Comparison study of automatic classifiers performance in emotion recognition of Arabic social media users. Journal of Theoretical and Applied Information Technology, 95(19), n/a.
Elnagar, A., Al-Debsi, R. and Einea, O. (2020). Arabic text classification using deep learning models. Information Processing and Management, 57(1), 102121. DOI: 10.1016/j.ipm.2019.102121
Euna, N.J., Hossain, S.M.M., Anwar, M.M. and Sarker, I.H. (2023). Content-based spam email detection using an N-gram machine learning approach. In: S. Nazmul , S.A. Mohammad , M Shamim , K. ASM (eds) Applied Intelligence for Industry 4.0 . England, Oxon, Chapman and Hall.
Grefenstette, G. (1999). Tokenization. In: van Halteren, H. (eds) Syntactic Wordclass Tagging. Text, Speech and Language Technology, vol 9. Springer, Dordrecht. DOI: 10.1007/978-94-015-9273-4_9
Istizada (2023). Complete List of Arabic Speaking Countries. Available at: https://istizada.com/complete-list-of-arabic-speaking-countries/ (assessed on 15/8/2024)
Kadhim, A. I. (2019). Survey on supervised machine learning techniques for automatic text classification. Artificial Intelligence Review, 52(1), 273–92. DOI: 10.1007/s10462-018-09677-1
Kamila, S., Hasanuzzaman, M., Ekbal, A. and Bhattacharyya, P. (2022). Investigating the impact of emotion on temporal orientation in a deep multitask setting. Scientific Reports, 12(1), 493. DOI: 10.1038/s41598-021-04331-3
Khalil, E.A.H., Houby, E.M.E. and Mohamed, H.K. (2021). Deep learning for emotion analysis in Arabic tweets. Journal of Big Data, 8(1), 136. DOI: 10.1186/s40537-021-00523-w
Medhat, W., Hassan, A. and Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–113. DOI: 10.1016/j.asej.2014.04.011
Mendonça, L.F., Vieira, S.M. and Sousa, J.M.C. (2007). Decision tree search methods in fuzzy modeling and classification. International Journal of Approximate Reasoning, 44(2), 106–23. DOI: 10.1016/j.ijar.2006.07.004
Mohammad, S., Bravo-Marquez, F., Salameh, M. and Kiritchenko, S. (2018). Semeval-2018 task 1: Affect in tweets. In: Proceedings of the 12th International Workshop on Semantic Evaluation, n/a(n/a), 1–17. DOI: 10.18653/v1/S18-1001
Mohammed, A. and Kora, R. (2022). An effective ensemble deep learning framework for text classification. Journal of King Saud University-Computer and Information Sciences, 34(10), 8825–37. DOI: 10.1016/j.jksuci.2021.11.001
Qian, T., Xie, A. and Bruckmann, C. (2022). Sensitivity analysis on transferred neural architectures of bert and gpt-2 for financial sentiment analysis. arXiv preprint arXiv:2207.03037.
DOI: 10.48550/arXiv.2207.03037
Rajaraman, A. and Ullman, J.D. (2011). Mining of massive datasets. 2nd edition. Stanford University, California, USA: Cambridge University Press. DOI: 10.1017/CBO9781139924801
Samy, A.E., El-Beltagy, S.R. and Hassanien, E. (2018). A context integrated model for multi-label emotion detection. Procedia Computer Science, 142(n/a), 61–71. DOI: 10.1016/j.procs.2018.10.461
Singh, A., Blanco, E. and Jin, W. (2019). Incorporating emoji descriptions improves tweet classification. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1 (n/a), 2096–101. DOI: 10.18653/v1/N19-1214
Singh, A., Thakur, N. and Sharma, A. (2016). A review of supervised machine learning algorithms. In: 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 16–18 /3/ 2016.
Storey, V.C. and O’Leary, D.E. (2024). Text analysis of evolving emotions and sentiments in COVID-19 Twitter communication. Cognitive Computation, 16(4), 1834–57. DOI: 10.1007/s12559-022-10025-3
Sun, C., Qiu, X., Xu, Y. and Huang, X. (2019). How to fine-tune bert for text classification?. In: Chinese Computational linguistics: 18th China National Conference, CCL 2019, Kunming, China, 18-20/10/2019.
DOI: 10.48550/arXiv.1905.05583
Tiwari, D., Nagpal, B., Bhati, B.S., Gupta, M., Suanpang, P., Butdisuwan, S. and Nanthaamornphong, A. (2024). SPSO-EFVM: A Particle Swarm Optimization-Based Ensemble Fusion Voting Model for Sentence-Level Sentiment Analysis. IEEE Access, 12(n/a), 23707–24. DOI: 10.1109/ACCESS.2024.3363158.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N. and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. 04-09/12/2017.
Venkatesh, R., K.V., Ranjitha and Venkatesh Prasad, B.S. (2020). Optimization scheme for text classification using machine learning Naïve Bayes classifier. In: ICDSMLA 2019: Proceedings of the 1st International Conference on Data Science, Machine Learning and Applications, n/a(n/a), 576–86. DOI: 10.1007/978-981-15-1420-3_61
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A. and Rush, A.M. (2020). Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, n/a(n/a), 38–45. DOI: 10.18653/v1/2020.emnlp-demos.6
Wynne, H.E. and Wint, Z.Z. (2019). Content-based fake news detection using n-gram models. In: Proceedings of the 21st International Conference on Information Integration and Web-based Applications and Services, n/a(n/a), 669–73. DOI: 10.1145/3366030.3366116
Yacouby, R. and Axman, D. (2020). Probabilistic extension of precision, recall, and F1 score for more thorough evaluation of classification models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, n/a(n/a), 79–91. DOI: 10.18653/v1/2020.eval4nlp-1.9
Yagi, S., Elnagar, A. and Fareh, S. (2023). A benchmark for evaluating Arabic word embedding models. Natural Language Engineering, 29(4), 978–1003. DOI: 10.1017/S1351324922000444
Ye, Q., Zhang, Z. and Law, R. (2009). Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications, 36(3), 6527–35. DOI: 10.1016/j.eswa.2008.07.035
Yousaf, A., Umer, M., Sadiq, S., Ullah, S., Mirjalili, S., Rupapara, V. and Nappi, M. (2020). Emotion recognition by textual tweets classification using voting classifier (LR-SGD). IEEE Access, 9(n/a), 6286–95. DOI: 10.1109/ACCESS.2020.3047831