[1] Wang, D., Wang, X. and Lv, S., 2019. An overview of end-to-end automatic speech recognition. Symmetry, 11(8), p.1018.
[2] Wang, Y., Mohamed, A., Le, D., Liu, C., Xiao, A., Mahadeokar, J., Huang, H., Tjandra, A., Zhang, X., Zhang, F. and Fuegen, C., 2020, May. Transformer-based acoustic modeling for hybrid speech recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6874-6878). IEEE.
[3] J. S. Chung, A. Nagrani, and A. Zisserman, “VoxceleB2: Deep speaker recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2018, vol. 2018-September. doi: 10.21437/Interspeech.2018-1929.
[4] Khalil, R.A., Jones, E., Babar, M.I., Jan, T., Zafar, M.H. and Alhussain, T., 2019. Speech emotion recognition using deep learning techniques: A review. IEEE Access, 7, pp.117327-117345.
[5] M. Johnson et al., “A systematic review of speech recognition technology in health care,” BMC Medical Informatics and Decision Making, vol. 14, no. 1. 2014. doi: 10.1186/1472-6947-14-94.
[6] Jauhiainen, T., Lui, M., Zampieri, M., Baldwin, T. and Lindén, K., 2019. Automatic language identification in texts: A survey. Journal of Artificial Intelligence Research, 65, pp.675-782.
[7] Tursunov, A., Choeh, J.Y. and Kwon, S., 2021. Age and gender recognition using a convolutional neural network with a specially designed multi-attention module through speech spectrograms. Sensors, 21(17), p.5892.
[8] Deshmukh, A.M., 2020. Comparison of hidden markov model and recurrent neural network in automatic speech recognition. European Journal of Engineering and Technology Research, 5(8), pp.958-965.
[9] D. Amodei et al., “Deep speech 2: End-to-end speech recognition in English and Mandarin,” in 33rd International Conference on Machine Learning, ICML 2016, 2016, vol. 1.
[10] Y. Xie, L. Le, Y. Zhou, and V. V. Raghavan, “Deep Learning for Natural Language Processing,” Handbook of Statistics, vol. 38, pp. 317–328, Jan. 2018, doi: 10.1016/BS.HOST.2018.05.001.
[11] S. M. Omer, J. A. Qadir, and Z. K. Abdul, “Uttered Kurdish digit recognition system,” Journal of University of Raparin, vol. 6, no. 2, 2019, doi: 10.26750/vol(6).no(2).paper5.
[12] J. A. Qadir, A. K. Al-Talabani, and H. A. Aziz, “Isolated Spoken Word Recognition Using One-Dimensional Convolutional Neural Network,” International Journal of Fuzzy Logic and Intelligent Systems, vol. 20, no. 4, 2020, doi: 10.5391/IJFIS.2020.20.4.272.
[13] Z. K. Abdul, “Kurdish Spoken Letter Recognition based on k-NN and SVM Model,” Journal of University of Raparin, vol. 7, no. 4, 2020, doi: 10.26750/vol(7).no(4).paper1.
[14] H. Veisi, H. Hosseini, M. Mohammadamini, W. Fathy, and A. Mahmudi, “Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon,” arXiv preprint arXiv:2102.07412, no. Furui 2005, 2021.
[15] A. Hannun et al., “Deep speech: Scaling up end-to-end speech recognition,” arxiv.org.
[16] W. Song and J. Cai, “End-to-End Deep Neural Network for Automatic Speech Recognition,” CS224N Projects, 2015.
[17] … P. L.-2019 34th I. and undefined 2019, “Speech recognition using deep learning,” ieeexplore.ieee.org.
[18] K. Noda, Y. Yamaguchi, K. Nakadai, H. G. Okuno, and T. Ogata, “Audio-visual speech recognition using deep learning,” Applied Intelligence, vol. 42, no. 4, 2015, doi: 10.1007/s10489-014-0629-7.
[19] U. A. KIMANUKA and O. BUYUK, “Turkish Speech Recognition Based On Deep Neural Networks,” Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 22, no. Özel, 2018, doi: 10.19113/sdufbed.12798.
[20] H. Veisi and A. Haji Mani, “Persian speech recognition using deep learning,” International Journal of Speech Technology, vol. 23, no. 4, 2020, doi: 10.1007/s10772-020-09768-x.
[21] H. A. Alsayadi, A. A. Abdelhamid, I. Hegazy, and Z. T. Fayed, “Arabic speech recognition using end‐to‐end deep learning,” IET Signal Processing, vol. 15, no. 8, 2021, doi: 10.1049/sil2.12057.
[22] L. Muda, M. Begam, and I. Elamvazuthi, “Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques,” Mar. 2010.
[23] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in ACM International Conference Proceeding Series, 2006, vol. 148. doi: 10.1145/1143844.1143891.
[24] de la Fuente Garcia, S., Ritchie, C.W. and Luz, S., 2020. Artificial intelligence, speech, and language processing approaches to monitoring Alzheimer’s disease: a systematic review. Journal of Alzheimer's Disease, 78(4), pp.1547-1574.
[25] Yu, D. and Deng, L., 2016. Automatic speech recognition (Vol. 1). Berlin: Springer.
[26] Markovnikov, N., Kipyatkova, I. and Lyakso, E., 2018, September. End-to-end speech recognition in Russian. In International Conference on Speech and Computer (pp. 377-386). Springer, Cham.
[27] Cabral, F.S., Fukai, H. and Tamura, S., 2019. Feature extraction methods proposed for speech recognition are effective on road condition monitoring using smartphone inertial sensors. Sensors, 19(16), p.3481.
[28] Ren, Y., Ruan, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z. and Liu, T.Y., 2019. Fastspeech: Fast, robust and controllable text to speech. Advances in Neural Information Processing Systems, 32.
[29] F. S. Cabral, H. Fukai, and S. Tamura, “Feature extraction methods proposed for speech recognition are effective on road condition monitoring using smartphone inertial sensors,” Sensors (Switzerland), vol. 19, no. 16, 2019, doi: 10.3390/s19163481.
[30] H. Veisi, M. MohammadAmini, and H. Hosseini, “Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus,” Digit. Scholarsh. Humanit., 2019, doi: 10.1093/llc/fqy074.