• Original research article
  • April 8, 2026
  • Open access

What hinders practicing English in virtual reality: a data-driven analysis of performance and errors among engineering students

Abstract

The study aims to identify the factors that hinder engineering students with intermediate English proficiency (A2-B1) from completing a learning dialogue in English as a foreign language within a virtual reality (VR) environment, using an analysis of error patterns based on scenario performance statistics. This study examines user results for the scenario “Basic information about the company” in two modes (demo testing and final exam) and compares performance distributions using descriptive statistics and visual analytics (histograms, kernel density estimation, and box plots). Within the error analysis, two key error types were distinguished and quantitatively described across individual dialogue steps: response selection errors (choosing an incorrect reply option) and speech recognition errors (failure of the system to accept/recognise a correctly spoken answer). The scientific novelty of the study lies in the fact that, using real VR dialogue interaction logs obtained from engineering students at RUDN during an English language training course, it proposes and applies a Recognition Dominance Index (RDI) to estimate the contribution of the technical factor (automatic speech recognition, ASR) to overall failure and to separate it from learning-related difficulties. The results show that the overall task difficulty remains comparable between the demo and exam (median performance around 65-67%); however, exam scores are more concentrated in the 60-75% range, while demo outcomes display higher variability and include low-score outliers. Speech recognition was demonstrated to be the main bottleneck of the scenario: on average, recognition errors occurred more frequently than selection errors (approximately 41.8% versus 30.2%), and the RDI indicated a predominantly recognition-driven nature of failure (approximately 76% on average). At the level of individual utterances, the highest non-recognition rates were observed in the closing expressions of gratitude and in questions about the company’s specialisation, whereas the highest incorrect-selection rates were associated with steps that required strict adherence to academic etiquette and precise wording. The findings suggest that successful completion of the VR dialogue is hindered less by users’ lack of content knowledge and more by speech recognition limitations and response design features, which highlights the need to improve the ASR component and scenario annotation when assessing communicative skills.

References

  1. Baker R. S., Inventado P. S. Educational Data Mining and Learning Analytics // Learning Analytics: From Research to Practice / ed. by J. A. Larusson, B. White. Springer, 2014. https://doi.org/10.1007/978-1-4614-3305-7_4
  2. Bohus D. Error awareness and recovery in conversational spoken language interfaces: Doctoral dissertation. Pittsburgh: Carnegie Mellon University, 2007.
  3. Bohus D., Rudnicky A. Sorry and I didn’t catch that! – an investigation of non-understanding errors and recovery strategies // Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue. Lisbon, 2005.
  4. Broussard K. M. Errors in Language Learning and Use: Exploring Error Analysis by Carl James // TESOL Quarterly. 1999. Vol. 33. No. 1. https://doi.org/10.2307/3588202
  5. Chen C., Yuan Y. Effectiveness of Virtual Reality on Chinese as a second language vocabulary learning: perceptions from international students // Computer Assisted Language Learning. 2023. Vol. 38 (3). https://doi.org/10.1080/09588221.2023.2192770
  6. De Araujo A., Papadopoulos P. M., Mckenney S., De Jong T. A learning analytics‐based collaborative conversational agent to foster productive dialogue in inquiry learning // Journal of Computer Assisted Learning. 2024. Vol. 40 (6). https://doi.org/10.1111/jcal.13007
  7. De Vries B. P., Cucchiarini C., Bodnar S., Strik H., Van Hout R. Spoken grammar practice and feedback in an ASR-based CALL system // Computer Assisted Language Learning. 2014. Vol. 28 (6). https://doi.org/10.1080/09588221.2014.889713
  8. Field A. Discovering Statistics Using IBM SPSS Statistics. 5th ed. Sage Publications, 2018.
  9. Frigge M., Hoaglin D. C., Iglewicz B. Some implementations of the Boxplot // The American Statistician. 1989. Vol. 43 (1).
  10. Graesser A., Jordan P., Vanlehn K., Rosé C., Harter D.Intelligent tutoring systems with conversational dialogue // AI Magazine. 2001. Vol. 22 (4). https://doi.org/10.1609/aimag.v22i4.1591
  11. Heift T., Schulze M. Errors and Intelligence in Computer-Assisted Language Learning: Parsers and Pedagogues. Routledge, 2007.
  12. James C. Errors in language learning and use: Exploring error analysis. Routledge, 2013.
  13. Jurafsky D., Martin J. H. Speech and Language Processing. 3rd ed. Pearson, 2020.
  14. Kang B. O., Jeon H., Lee Y. K. AI‐based language tutoring systems with end‐to‐end automatic speech recognition and proficiency evaluation // ETRI Journal. 2024. Vol. 46 (1). https://doi.org/10.4218/etrij.2023-0322
  15. Knill K., Gales M., Kyriakopoulos K., Malinin A., Ragni A., Wang Y., Caines A. Impact of ASR Performance on Free Speaking Language Assessment // Interspeech 2018 (Hyderabad, India, 2-6 September 2018). 2018. https://doi.org/10.21437/interspeech.2018-1312
  16. Lee A. Assessing Speaking Skills in Virtual Reality: Impacts and Implications // English Teaching. 2025. Vol. 80 (2).
  17. Lev-Ari S.Comprehending non-native speakers: theory and evidence for adjustment in manner of processing // Frontiers in Psychology. 2015. Vol. 5. https://doi.org/10.3389/fpsyg.2014.01546
  18. Lev-Ari S., Keysar B. Less-Detailed Representation of Non-Native Language: Why Non-Native Speakers’ Stories Seem More Vague // Discourse Processes. 2012. Vol. 49 (7). https://doi.org/10.1080/0163853x.2012.698493
  19. Li K.-C., Chang M., Wu K.-H. Developing a Task-Based Dialogue System for English Language Learning // Education Sciences. 2020. Vol. 10 (11). https://doi.org/10.3390/educsci10110306
  20. Palmas F., Cichor J., Plecher D. A., Klinker G. Acceptance and Effectiveness of a Virtual Reality Public Speaking Training // IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (Beijing, China). 2019. https://doi.org/10.1109/ismar.2019.00034
  21. Radzikowski K., Wang L., Yoshie O., Nowak R. Accent modification for speech recognition of non-native speakers using neural style transfer // EURASIP Journal on Audio, Speech, and Music Processing. 2021. Vol. 2021 (1). https://doi.org/10.1186/s13636-021-00199-3
  22. Sadigzade Z. Immersive and Gamified Approaches: VR/AR in Language Learning // Porta Universorum. 2025. Vol. 1 (6). https://doi.org/10.69760/portuni.0106002
  23. Scott D. W. Multivariate density estimation: Theory, practice, and visualization. Wiley, 2015.
  24. Silverman B. W. Density estimation for statistics and data analysis. Chapman and Hall, 1986.
  25. Thi-Nhu Ngo T., Hao-Jan Chen H., Kuo-Wei Lai K. The effectiveness of automatic speech recognition in ESL/EFL pronunciation: A meta-analysis // ReCALL. 2023. Vol. 36 (1). https://doi.org/10.1017/s0958344023000113
  26. Tobin J., Nelson P., Macdonald B., Heywood R., Cave R., Seaver K., Desjardins A., Jiang P.-P., Green J. R. Automatic Speech Recognition of Conversational Speech in Individuals with Disordered Speech // Journal of Speech, Language, and Hearing Research: JSLHR. 2024. Vol. 67 (11). https://doi.org/10.1044/2024_jslhr-24-00045
  27. Tukey J. W. Exploratory data analysis. Addison-Wesley, 1977.
  28. VanLehn K. The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems // Educational Psychologist. 2011. Vol. 46 (4).
  29. Wang Z., Schultz T., Waibel A.Comparison of acoustic model adaptation techniques on non-native speech // 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (Hong Kong, China): Proceedings. 2003. https://doi.org/10.1109/icassp.2003.1198837
  30. Yang X., Chen Y.-N., Hakkani-Tur D., Crook P., Li X., Gao J., Deng L. End-to-end joint learning of natural language understanding and dialogue manager // 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (New Orleans, LA, USA): Proceedings. 2017. https://doi.org/10.1109/icassp.2017.7953246

Author information

Andrey Sergeevich Korzin

Peoples’ Friendship University of Russia, Moscow

Nataliia Alexandrovna Alekseeva

Peoples’ Friendship University of Russia, Moscow

Elena Fedorovna Shaleeva

Peoples’ Friendship University of Russia, Moscow

Svetlana Vladimirovna Dmitrichenkova

PhD

Peoples’ Friendship University of Russia, Moscow

Larisa Vladimirovna Kruglova

PhD

Peoples’ Friendship University of Russia, Moscow

About this article

Publication history

  • Received: March 1, 2026.
  • Published: April 8, 2026.

Keywords

  • изучение языков с помощью виртуальной реальности (VR)
  • обучение на основе диалогов
  • аналитика обучения
  • ошибки распознавания речи (ASR)
  • оценка эффективности
  • virtual reality (VR) language learning
  • dialogue-based training
  • learning analytics
  • speech recognition (ASR) errors
  • performance assessment

Copyright

© 2026 The Author(s)
© 2026 Gramota Publishing, LLC

User license

Creative Commons Attribution 4.0 International (CC BY 4.0)