Document Type : Research Paper

Authors

1 Department of Computer Science, Faculty of Science, University of Soran, Soran, Erbil, Kurdistan, Iraq

2 Department of Computer Science, Faculty of New Sciences and Technologies, University of Tehran (Visitor at Soran University)

10.37652/juaps.2022.176501

Abstract

Sentiment Analysis (SA) as a type of opinion mining and as a more general topic than polarity detection, is widely used for analyzing user's reviews or comments of online expressions, which is implemented using various techniques among which the Artificial Neural Network (ANN) is the most popular one. This paper addresses the development of an SA system for the Central Kurdish language (CKB) using deep learning. Increasing the efficiency and strengthening of the SA system relies on a robust language model. In addition, for creating and training a robust language model, collecting a large amount of text corpus is required and we have created a corpus of size 300 million tokens for CKB. Also, to train the SA model, we collected 14,881 comments on Facebook, then they are labeled manually. The combination of Word2Vec for the language model and Long Short-Term Memory (LSTM) for the classifier are used to create an SA model on the CKB SA dataset. These deep learning-based techniques are the most well-known methods in this field which have received high performance in SA for various languages. The performance of the proposed method for 3 classes SA is %71.35 accuracy. This result is superior to the best-reported result for CKB.

Keywords

Main Subjects

[1]
S. H. Sumit, M. Z. Hossan, T. Al Muntasir and T. Sourov, "Exploring word embedding for bangla sentiment analysis," in 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), 2018.
[2]
I. H. Sarker, "Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions," SN Computer Science, vol. 2, no. 6, pp. 1-20, 2021.
[3]
S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M. P. Reyes, M.-L. Shyu, S.-C. Chen and S. S. Iyengar, "A survey on deep learning: Algorithms, techniques, and applications," ACM Computing Surveys (CSUR), vol. 51, no. 5, pp. 1-36, 2018.
[4]
Y. Yu, X. Si, C. Hu and J. Zhang, "A review of recurrent neural networks: LSTM cells and network architectures," Neural computation, vol. 31, no. 7, pp. 1235-1270, 2019.
[5]
M. H. Shakeel, S. Faizullah, T. Alghamidi and I. Khan, "Language independent sentiment analysis," in 2019 International Conference on Advances in the Emerging Computing Technologies (AECT), 2020.
[6]
M. A. Paredes-Valverde, R. Colomo-Palacios, M. d. P. Salas-Zárate and R. Valencia-García, "Sentiment analysis in Spanish for improvement of products and services: A deep learning approach," Scientific Programming, no. Hindawi, 2017.
[7]
P. Vateekul and T. Koomsubha, "A study of sentiment analysis using deep learning techniques on Thai Twitter data," 2016.
[8]
B. Roshanfekr, S. Khadivi and M. Rahmati, "Sentiment analysis using deep learning on Persian texts," 2017.
[9]
K. S. Esmaili, D. Eliassi, S. Salavati, P. Aliabadi, A. Mohammadi, S. Yosefi and S. Hakimi, "Building a test collection for Sorani Kurdish," 2013.
[10]
Y. Bengio, R. Ducharme and P. Vincent, "A neural probabilistic language model," Advances in Neural Information Processing Systems, vol. 13, 2000.
[11]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado and J. Dean, "Distributed representations of words and phrases and their compositionality," Advances in neural information processing systems, vol. 26, 2013.
[12]
R. S. Hawezi, M. Y. Azeez and A. A. Qadir, "Spell checking algorithm for agglutinative languages “Central Kurdish as an example”," in 2019 International Engineering Conference (IEC), 2019.
[13]
S. Salavati and S. Ahmadi, "Building a Lemmatizer and a Spell-checker for Sorani Kurdish," arXiv preprint arXiv:1809.10763, 2018.
[14]
A. M. Saeed, T. A. Rashid, A. M. Mustafa, R. A. Agha, A. S. Shamsaldin and N. K. Al-Salihi, "An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification," Iran Journal of Computer Science, vol. 1, no. 2, pp. 99-107, 2018.
[15]
A. M. Mustafa and T. A. Rashid, "Kurdish stemmer pre-processing steps for improving information retrieval," Journal of Information Science, vol. 44, no. 1, pp. 15-27, 2018.
[16]
S. Jaf and A. Ramsay, "Stemmer and a POS tagger for Sorani Kurdish.," in 6th International Conference on Corpus Linguistics, 2014.
[17]
S. Ahmadi, "KLPT–Kurdish Language Processing Toolkit," in Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS), 2020.
[18]
F. Mohammed, L. Zakaria, N. Omar and M. Albared, "Automatic Kurdish SORANi text categorization using N-gram based model," 2012.
[19]
G. Gautier and P. o. ICEMCO, "Building a Kurdish language corpus: an overview of the technical problems," Proceedings of ICEMCO, 1998.
[20]
G. Walther and B. Sagot, "Developing a large-scale lexicon for a less-resourced language: General methodology and preliminary experiments on Sorani Kurdish," 2010.
[21]
P. K. Sarma and B. Sethares, "Sentiment analysis by joint learning of word embeddings and classifier," arXiv preprint arXiv:1708.03995, 2017.
[22]
A. Severyn and A. Moschitti, "Twitter sentiment analysis with deep convolutional neural networks," 2015.
[23]
J. Y. Lee and F. Dernoncourt, "Sequential short-text classification with recurrent and convolutional neural networks," arXiv preprint arXiv:1603.03827, 2016.
[24]
M. Abdullah and S. Shaikh, "Teamuncc at semeval-2018 task 1: Emotion detection in english and arabic tweets using deep learning," 2018.
[25]
M. Heikal, M. Torki and N. El-Makky, "Sentiment analysis of Arabic tweets using deep learning," Procedia Computer Science, vol. 142, no. Elsevier, pp. 114-122, 2018.
[26]
M. Al-Smadi, O. Qawasmeh, M. Al-Ayyoub, Y. Jararweh and B. Gupta, "Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews," Journal of computational science, vol. 27, pp. 386-393, 2018.
[27]
S. Abdulla and M. H. Hama, "Sentiment analyses for Kurdish social network texts using Naive Bayes classifier," Journal of University of Human Development, vol. 1, no. 4, pp. 393-397, 2015.
[28]
S. Garg, D. S. Panwar, A. Gupta and R. Katarya, "A Literature Review On Sentiment Analysis Techniques Involving Social Media Platforms," 2020.
[29]
P. F. Muhammad, R. Kusumaningrum and A. Wibowo, "Sentiment analysis using Word2Vec and long short-term memory (LSTM) for Indonesian hotel reviews," Procedia Computer Science, vol. 197, pp. 728-735, 2021.
[30]
H. Veisi, M. MohammadAmini and H. Hosseini, "Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus," Digital Scholarship in the Humanities, vol. 35, no. Oxford University Press, 2020.
[31]
Y. Liu, W. Song, L. Liu and H. Wang, "Document representation based on semantic smoothed topic model," in IEEE, 2016.
[32]
L. Zhu, G. Wang and X. Zou, "A study of Chinese document representation and classification with Word2vec," in IEEE, 2016.
[33]
Z. Jianqiang, G. Xiaolin and Z. Xuejun, "Deep convolution neural networks for twitter sentiment analysis," IEEE Access, vol. 6, no. IEEE , pp. 23253-23260, 2018.
[34]
A. Mahmudi, H. Veisi, M. MohammadAmini and H. Hosseini, "Automated Kurdish Text Normalization," 2019.
[35]
B. Jang, M. Kim, G. Harerimana, S.-u. Kang and J. W. Kim, "Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism," Applied Sciences, vol. 10, no. 17, p. 5841, 2020.
[36]
S. H. Sumit, M. Z. Hossan, T. Al Muntasir and T. Sourov, "Exploring word embedding for bangla sentiment analysis," in 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), 2018.