Automated scoring in the era of artificial intelligence: An empirical study with Turkish essays

Research output: Journal contributionsScientific review articlesResearch

Authors

Automated scoring (AS) has gained significant attention as a tool to enhance the efficiency and reliability of assessment processes. Yet, its application in under-represented languages, such as Turkish, remains limited. This study addresses this gap by empirically evaluating AS for Turkish using a zero-shot approach with a rubric powered by OpenAI's GPT-4o. A dataset of 590 essays written by learners of Turkish as a second language was scored by professional human raters and an artificial intelligence (AI) model integrated via a custom-built interface. The scoring rubric, grounded in the Common European Framework of Reference for Languages, assessed six dimensions of writing quality. Results revealed a strong alignment between human and AI scores with a Quadratic Weighted Kappa of 0.72, Pearson correlation of 0.73, and an overlap measure of 83.5 %. Analysis of rater effects showed minimal influence on score discrepancies, though factors such as experience and gender exhibited modest effects. These findings demonstrate the potential of AI-driven scoring in Turkish, offering valuable insights for broader implementation in under-represented languages, such as the possible source of disagreements between human and AI scores. Conclusions from a specific writing task with a single human rater underscore the need for future research to explore diverse inputs and multiple raters.

Original languageEnglish
Article number103784
JournalSystem
Volume133
Number of pages12
ISSN0346-251X
DOIs
Publication statusPublished - 10.2025

Bibliographical note

Publisher Copyright:
© 2025 The Authors

    Research areas

  • Automated scoring, Large language models, Multilevel models, Rater reliability, Turkish essays, Zero-shot with rubric
  • Educational science

Recently viewed

Publications

  1. Geometric structures using model predictive control for an electromagnetic actuator
  2. Theorie des Quantum Computings
  3. Petri net based EMIS-mappers for flexible manufacturing systems
  4. Serendipity as a Mechanism of Change and its Potential for Explaining Change Processes
  5. Understanding Low-Code Evolution, Adoption and Ecosystem for Software Development
  6. Internet research differs from research on internet users
  7. Frame-based Data Factorizations
  8. Creep behavior of AE42 based hybrid composites
  9. Effect of silicon content on hot working, processing maps, and microstructural evolution of cast TX32-0.4Al magnesium alloy
  10. A Lyapunov Approach to Set the Parameters of a PI-Controller to Minimise Velocity Oscillations in a Permanent Magnet Synchronous Motor Using Chopper Control for Electrical Vehicles
  11. Digital teaching as an instrument for cross-location teaching networks in medical informatics
  12. Effects of linguistic demands of reality-based mathematical tasks
  13. Lifeworld and System
  14. Applying the Three Horizons approach in local and regional scenarios to support policy coherence in SDG implementation
  15. CHANGING RECREATIONAL ACTIVITIES FOR REDUCING INSOMNIA SEVERITY? RESULTS FROM A SERIAL MEDIATION ANALYSIS ON THE IMPACT OF RECREATIONAL BEHAVIOR AS A MECHANISM OF CHANGE IN DIGITAL INTERVENTIONS FOR INSOMNIA
  16. Shepherds’ local knowledge and scientific data on the scavenging ecosystem service
  17. Influence of Equal-Channel Angular Pressing on the Microstructure and Texture of Mg-Zn-Y-Zr-RE Alloy Sheets
  18. Towards a caring transdisciplinary research practice
  19. Soft Optimal Computing to Identify Surface Roughness in Manufacturing Using a Gaussian and a Trigonometric Regressor
  20. An isomorphism between polynomial eigenfunctions of the transfer operator and the Eichler cohomology for modular groups
  21. Integrating adaptation and mitigation to climatic changes
  22. Functional traits mediate the effect of land use on drivers of community stability within and across trophic levels