Scientific Journal Of King Faisal University: Basic and Applied Sciences

ع

Scientific Journal of King Faisal University: Basic and Applied Sciences

Automated Oropharyngeal Dysphagia Assessment with Mask R-CNN and Kinematic Measures

(Zirsha Riaz, Aniqa Dilawari, Sajid Iqbal and Ahmed A. Alyahya)

Abstract

Oropharyngeal dysphagia (OD) is characterised by difficulty swallowing liquids or food, significantly affecting an individual’s quality of life and potentially leading to serious health issues such as poor nutrition, dehydration and pneumonia. Diagnosis typically involves the use of a video fluoroscopic swallowing study (VFSS), a method that, while effective, is expensive, time-consuming and requires expert interpretation. Recent advancements in artificial intelligence (AI) offer a promising alternative for enhancing dysphagia diagnosis by providing a more efficient and accurate solution. In this paper, we propose an AI-based system for diagnosing OD. The system processes multi-frame image data from VFSS videos using mask region-based convolutional neural network for object detection and segmentation. This method is based on a feature pyramid network and a ResNet101 backbone. It calculates five kinematic measures – ring measurement, hyoid displacement, bolus clearance ratio, pharyngeal constriction ratio and peak oesophageal sphincter – to assess the presence or absence of the swallowing disorder. The system was evaluated in real time on 250 patients (150 males and 100 females), classifying them as either with or without dysphagia, and achieved an accuracy of 96.8%. This system is expected to significantly assist clinicians.
KEYWORDS
Artificial intelligence, fluoroscopic data, multi-frame image, oropharyngeal dysphagia, pyramid network, swallowing disorder

PDF

References

Akhtar, R.N., Behn, N. and Morgan, S. (2024). Understanding dysphagia care in Pakistan: A survey of current speech language therapy practice. Dysphagia, 39(3), 484–94. DOI: 10.1007/s00455-023-10633-7.
Bandini, A. and Steele, C.M. (2021). The effect of time on the automated detection of the pharyngeal phase in videofluoroscopic swallowing studies. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE. DOI: 10.1109/EMBC46164.2021.9629562.
Bharati, P. and  Pramanik, A. (2020). Deep learning techniques R-CNN to mask R-CNN: A survey. In: A., Das, J., Nayak, B.,  Naik, S., Pati, D., Pelusi (eds) Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing. Springer, Singapore. DOI: 10.1007/978-981-13-9042-5_56.
Dudik, J.M., Kurosu, A., Coyle, J.L. and Sejdić, E. (2018). Dysphagia and its effects on swallowing sounds and vibrations in adults. Biomedical Engineering Online, 17(1), 69. DOI: 10.1186/s12938-018-0501-9.
Fattori, B., Giusti, P., Mancini, V., Grosso, M., Barillari, M.R., Bastiani, L. and Nacci, A. (2016). Comparison between videofluoroscopy, fiberoptic endoscopy and scintigraphy for diagnosis of oro-pharyngeal dysphagia. Acta Otorhinolaryngologica Italica, 36(5), 395. DOI: 10.14639/0392-100X-829.
Girardi, A.M., Cardell, E.A. and Bird, S.P. (2023). Artificial intelligence in the interpretation of videofluoroscopic swallow studies: implications and advances for speech–language pathologists. Big Data and Cognitive Computing, 7(4), 178. DOI: 10.3390/bdcc7040178.
Gugatschka, M., Egger, N.M., Haspl, K., Hortobagyi, D., Jauk, S., Feiner, M. and Kramer, D. (2024). Clinical evaluation of a machine learning-based dysphagia risk prediction tool. European Archives of Oto-Rhino-Laryngology, 281(8), 4379–84. DOI: 10.1007/s00405-024-08678-x.
He, K., Gkioxari, G., Dollár, P. and Girshick, R. (2017). Mask r-cnn. In: Proceedings of the IEEE international Conference on Computer Vision. DOI: 10.1109/iccv.2017.322.
He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . DOI: 10.1109/cvpr.2016.90.
Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. DOI: 10.1109/CVPR.2017.243.
Iida, Y., Näppi, J., Kitano, T., Hironaka, T., Katsumata, A. and Yoshida, H. (2023). Detection of aspiration from images of a videofluoroscopic swallowing study adopting deep learning. Oral Radiology, 39(3), 553–562. DOI: 10.1007/s11282-023-00669-8.
Inamoto, Y., Ueha, R. and Gonzalez-Fernandez, M. (2024). Use of CT for dysphagia evaluation: Advantages and disadvantages in the study of swallowing. Current Physical Medicine and Rehabilitation Reports, 12(3), 250–255. DOI: 10.1007/s40141-024-00451-9.
Jeong, C.W., Lee, C.S., Lim, D.W., Noh, S.H., Moon, H.K., Park, C. and Kim, M.S. (2024). The development of an artificial intelligence video analysis-based web application to diagnose oropharyngeal Dysphagia: A pilot study. Brain Sciences, 14(6), 546. DOI: 10.2196/preprints.53738.
Jeong, S.Y., Kim, J.M., Park, J.E., Baek, S.J. and Yang, S.N. (2023). Application of deep learning technology for temporal analysis of videofluoroscopic swallowing studies. Scientific Reports, 13(1), 17522. DOI: 10.21203/rs.3.rs-2311543/v1.
Jones, C.A., Colletti, C.M. and Ding, M.C. (2020). Post-stroke dysphagia: recent insights and unanswered questions. Current neurology and Neuroscience Reports, 20(12), 61.  DOI: 10.1007/bf00262751.
Kamran, M., Fawwad, A., Haider, S.I., Hussain, T. and Ahmed, J. (2021). Upper gastrointestinal endoscopy; A study from a rural population of Sindh, Pakistan. Pakistan Journal of Medical Sciences, 37(1), 9. 10.12669/pjms.37.1.3297.
Kim, H.T., Min, H.J. and Kim, H.J. (2025). Reliability and validity analyses of the practical assessment of dysphagia test in stroke. Dysphagia, 40(1), 110–7. DOI: 10.1007/s00455-024-10708-z.
Kim, J.K., Choo, Y.J., Choi, G.S., Shin, H., Chang, M.C. and Park, D. (2022). Deep learning analysis to automatically detect the presence of penetration or aspiration in videofluoroscopic swallowing study. Journal of Korean Medical Science, 37(6), n/a. DOI: 10.3346/jkms.2022.37.e42.
Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. DOI: 10.1145/3065386.
Labeit, B., Michou, E., Trapl-Grundschober, M., Suntrup-Krueger, S., Muhle, P., Bath, P.M. and Dziewas, R. (2024). Dysphagia after stroke: research advances in treatment interventions. The Lancet Neurology, 23(4), 418–28. DOI: 10.1016/s1474-4422(24)00053-x.
Langmore, S.E., Terpenning, M.S., Schork, A., Chen, Y., Murray, J.T., Lopatin, D. and Loesche, W.J. (1998). Predictors of aspiration pneumonia: how important is dysphagia?. Dysphagia, 13(2), 69–81. DOI: 10.1007/pl00009559.
Lee, S.J., Ko, J.Y., Kim, H.I. and Choi, S.I. (2020). Automatic detection of airway invasion from videofluoroscopy via deep learning technology. Applied Sciences, 10(18), 6179. DOI: 10.3390/app10186179.
Leonard, R., Miles, A. and Allen, J. (2023). Bolus clearance ratio elevated in patients with neurogenic dysphagia compared with healthy adults: A measure of pharyngeal efficiency. American Journal of Speech-Language Pathology, 32(1), 107–14. DOI: 10.1044/2022_ajslp-22-00199.
Leonard, R.J., Kendall, K.A., McKenzie, S., Gonçalves, M.I. and Walker, A. (2000). Structural displacements in normal swallowing: A videofluoroscopic study. Dysphagia, 15(3), 146–52. DOI: 10.1007/s004550010017.
López-Liria, R., Parra-Egeda, J., Vega-Ramírez, F.A., Aguilar-Parra, J.M., Trigueros-Ramos, R., Morales-Gázquez, M.J. and Rocamora-Pérez, P. (2020). Treatment of dysphagia in Parkinson’s disease: a systematic review. International Journal of Environmental Research and Public Health, 17(11), 4104. DOI: 10.3390/ijerph17114104.
Martin, B.J., Corlew, M.M., Wood, H., Olson, D., Golopol, L.A., Wingo, M. and Kirmani, N. (1994). The association of swallowing dysfunction and aspiration pneumonia. Dysphagia, 9(1), 1–6. DOI: 10.1007/BF00262751.
Martin-Martinez, A., Miró, J., Amadó, C., Ruz, F., Ruiz, A., Ortega, O. and Clave, P. (2023). A systematic and universal artificial intelligence screening method for oropharyngeal dysphagia: improving diagnosis through risk management. Dysphagia, 38(4), 1224–37. DOI: 10.1007/s00455-022-10547-w.
Min, I., Woo, H., Kim, J.Y., Kim, T.L., Lee, Y., Chang, W.K. and Seo, H.G. (2024). Inter-rater and intra-rater reliability of the videofluoroscopic dysphagia scale with the standardized protocol. Dysphagia, 39(1), 43–51. DOI: 10.1007/s00455-023-10590-1.
Miyagi, S., Sugiyama, S., Kozawa, K., Moritani, S., Sakamoto, S.I. and Sakai, O. (2020, April). Classifying dysphagic swallowing sounds with support vector machines. In: Healthcare. MDPI. DOI: 10.3390/healthcare8020103.
Molfenter, S.M. and Steele, C.M. (2013). Variation in temporal measures of swallowing: sex and volume effects. Dysphagia, 28(2), 226–33. DOI: 10.1007/s00455-012-9437-6.
O’Brien, M.K., Botonis, O.K., Larkin, E., Carpenter, J., Martin-Harris, B., Maronati, R. and Jayaraman, A. (2021). Advanced machine learning tools to monitor biomarkers of dysphagia: a wearable sensor proof-of-concept study. Digital Biomarkers, 5(2), 167–75. DOI: 10.1159/000517144.
Omari, T.I., Dejaeger, E., Tack, J., Van Beckevoort, D. and Rommel, N. (2013). Effect of bolus volume and viscosity on pharyngeal automated impedance manometry variables derived for broad dysphagia patients. Dysphagia, 28(2), 146–52. DOI: 10.1007/s00455-012-9423-z.
Rafeedi, T., Abdal, A., Polat, B., Hutcheson, K.A., Shinn, E.H. and Lipomi, D.J. (2023). Wearable, epidermal devices for assessment of swallowing function. Npj Flexible Electronics, 7(1), 52. DOI: 10.1038/s41528-023-00286-9.
Rashid, H., Bakht, K., Arslan, A. and Ahmad, A. (2020). Endoscopic Findings and Their Association With Gender, Age and Duration of Symptoms in Patients With Dysphagia. Cureus, 12(10),n/a. DOI: 10.7759/cureus.11264.
Reddy, C.S., Park, E. and Lee, J.T. (2023). Comparative analysis of deep learning architectures for penetration and aspiration detection in videofluoroscopic swallowing studies. IEEE Access, 11(n/a), 102843–102851. DOI: 10.1109/access.2023.3315342.
Sadeghi, Z., Afshar, M., Memarian, A. and Flowers, H.L. (2024). Risk factors and long-term outcomes of oropharyngeal dysphagia in persons with multiple sclerosis: A systematic review protocol. Systematic Reviews, 13(1), 121. DOI: 10.1186/s13643-024-02530-3.
Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. DOI: 10.48550/arXiv.1409.1556.
Slovik, Y., Kaminer, B.M., Revital, G., Ron, A., Harris, M., Ziv, O. and Cohen, O. (2025). A Modified Fiberoptic Endoscopic Evaluation of Swallowing Evaluating Esophageal Dysphagia by a Capsule: A Pilot Study. Dysphagia, 40(1), 263–70. DOI: 10.1007/s00455-024-10724-z.
Stokely, S.L., Peladeau-Pigeon, M., Leigh, C., Molfenter, S.M. and Steele, C.M. (2015). The relationship between pharyngeal constriction and post-swallow residue. Dysphagia, 30(3), 349–56. DOI: 10.1007/s00455-015-9606-5.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D. and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. DOI: 10.1109/cvpr.2015.7298594.
Verma, S., Devarajan, G.G. and Sharma, P.K. (2025). Modified efficient net of chest x-ray images for lung disease classification using transfer learning approach. Scientific Journal of King Faisal University: Basic and Applied Sciences, 26(1), 35–42. DOI: 10.37575/b/eng/240032