Scientific Journal Of King Faisal University: Basic and Applied Sciences
Scientific Journal of King Faisal University: Humanities and Management
Developing a Stress Prediction Tool for Arabic Speech Recognition Tasks
(Eiman Alsharhan and Salah Alnajem)Abstract
Developing natural language processing applications for Arabic must consider the different linguistic characteristics found in speech and translate those characteristics to script in order to reduce computational complexity and therefore reduce the word error rate (WER). Suprasegmental features are fundamental properties of speech that can enhance the performance of many natural speech processing applications. The present study considered stress as a prosodic feature comprising the prominence of syllables in speech by developing a tool that generated phonetic transcriptions and predicted the stress position. The generated transcription was used to create the phonetic dictionary necessary for developing an automatic speech recognition (ASR) system. This tool had to be accurate, linguistically motivated, and applicationally useful; therefore, the effectiveness of the generated stress-marked phonetic dictionary was tested by comparing the performance of a standard fixed dictionary-based system with that of one using the automatically generated dictionary. The research reported a 5.6% reduction in WER when using a dictionary with stress markers attached to each phone in the stressed syllable and a 3.5% reduction in WER when using a dictionary with stress markers assigned only to stressed vowels. These results encourage future studies to employ prosodic features of speech when developing different speech processing applications.
KEYWORDS
Suprasegmental features, stress, phonetic transcription, automatic speech recognition
PDF
References
Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S. and Glass, J. (2014(. A complete KALDI recipe for building Arabic speech recognition systems. In 2014 IEEE Spoken Language Technology Workshop, South Lake Tahoe, California and Nevada, USA.
Alsharif, B., Tahboub, R. and Arafeh, L. (2016). Arabic text to speech synthesis using Quran-based natural language processing module. Journal of Theoretical and Applied Information Technology, 83(1), 148–68.
Amrous, A.I., Debyeche, M. and Amrouche, A. (2011). Prosodic features and formant contribution for Arabic speech recognition in noisy environments. In Soft Computing Models in Industrial and Environmental Applications, 6th International Conference SOCO 2011 (465–474). Berlin: Springer.
Angoujard, J.P. (1990). Metrical structure of Arabic (Vol. 35). Walter de Gruyter GmbH.
Azmi, M.M. and Tolba, H. (2008). Syllable-based automatic Arabic speech recognition in different conditions of noise. In 2008 9th International Conference on Signal Processing (601–604). IEEE Beijing, China.
Betti, M.J. and Ulaiwi, W.A. (2018). Stress in English and Arabic: A contrastive study. English Language and Literature Studies, 8(1), 83–102.
Biadsy, F., Hirschberg, J. and Habash, N. (2009). Spoken Arabic dialect identification using phonotactic modeling. In Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages (53–61). Athens, Greece.
Chittaragi, N.B., Prakash, A. and Koolagudi, S.G. (2018). Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arabian Journal for Science and Engineering, 43(8), 4289–302.
De Jong, K. and Zawaydeh, B.A. (1999). Stress, duration, and intonation in Arabic word-level prosody. Journal of Phonetics, 27(1), 3–22.
Habash, N., Soudi, A. and Buckwalter, T. (2007). On Arabic transliteration. In A. Soudi, A.V.D. Bosch, N. Günter (eds.) Arabic Computational Morphology (15–22). Belin: Springer.
Halpern, J. (2009). Word stress and vowel neutralization in modern standard Arabic. In 2nd International Conference on Arabic Language Resources and Tools (1–7), Cairo, Eygpt.
Hanna, S., El-Farahaty, H. and Khalifa, A.W. (2019). The Routledge Handbook of Arabic Translation. UK: Routledge.
Holes, C. (2004). Modern Arabic: Structures, Functions, and Varieties. Washington, D.C.: Georgetown University Press.
Ibrahim, N.J., Idris, M.Y.I., Yakub, M., Yusoff, Z.M., Rahman, N.N.A. and Dien, M.I. (2019). Robust feature extraction based on spectral and prosodic features for classical Arabic accents recognition. Malaysian Journal of Computer Science, 31(n/a), 46–72.
Khelifa, M.O., Elhadj, Y.M., Abdellah, Y. and Belkasmi, M. (2017). Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system. International Journal of Speech Technology, 20(4), 937–49.
Lounnas, K., Demri, L., Falek, L. and Teffahi, H. (2018). Automatic language identification for Berber and Arabic languages using prosodic features. In 2018 International Conference on Electrical Sciences and Technologies in Maghreb (CISTEM) (1–4), Algiers, Algeria, 28–31/10/2018.
Mannepalli, K., Sastry, P.N. and Suman, M. (2018). Analysis of emotion recognition system for Telugu using prosodic and formant features. In Speech and Language Processing for Human-Machine Communications (137–44). Berlin: Springer.
Martinez, D., Lleida, E., Ortega, A. and Miguel, A. (2013). Prosodic features and formant modeling for an ivector-based language recognition system. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (6847–6851), Vancouver, Canada, 26–31/05/2013.
McCarthy, J.J. and Prince, A.S. (1990). Foot and word in prosodic morphology: The Arabic broken plural. Natural Language & Linguistic Theory, 8(2), 209–83.
Meftah, A., Alotaibi, Y. and Selouani, S.A. (2016). Emotional speech recognition: A multilingual perspective. In 2016 International Conference on Bio-Engineering for Smart Technologies (BioSMART) (1–4), Paris, France, 04/12/2016.
Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O. and Roth, R. (2014). MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Language Resources and Evaluation Conference (LREC) (1094–1101), Reykjavik, Iceland, 26-31/05/2014.
Reddy, V.R., Maity, S. and Rao, K.S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.
Ryding, K.C. (2014). Arabic: A linguistic introduction. Cambridge, UK: Cambridge University Press.
Sharma, D.P. and Atkins, J. (2014). Automatic speech recognition systems: challenges and recent implementation trends. International Journal of Signal and Imaging Systems Engineering, 7(4), 220–34.
Vetulani, Z. ed. (2011). Human language technology: Challenges for computer science and linguistics. In 4th Language and Technology Conference (LTC 2009), Poznan, Poland, 06/08/11/2009.
Wang, L., Zhang, C., Woodland, P.C., Gales, M.J., Karanasou, P., Lanchantin, P., Liu, X. and Qian, Y. (2016). Improved DNN-based segmentation for multi-genre broadcast audio. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (5700–5704), Shanghai, China, 20–25/03/2016.
Watson, J.C. (2002). The Phonology and Morphology of Arabic. Oxford, UK: Oxford University.
Young, S., Gunnar, E., Mark, G., Hain, T. and Kershaw, D. (2015). The HTK Book Version 3.5 Alpha. Cambridge, UK: Cambridge University Press.