Automated scoring in the era of artificial intelligence: An empirical study with Turkish essays

Research output: Journal contributionsScientific review articlesResearch

Authors

Automated scoring (AS) has gained significant attention as a tool to enhance the efficiency and reliability of assessment processes. Yet, its application in under-represented languages, such as Turkish, remains limited. This study addresses this gap by empirically evaluating AS for Turkish using a zero-shot approach with a rubric powered by OpenAI's GPT-4o. A dataset of 590 essays written by learners of Turkish as a second language was scored by professional human raters and an artificial intelligence (AI) model integrated via a custom-built interface. The scoring rubric, grounded in the Common European Framework of Reference for Languages, assessed six dimensions of writing quality. Results revealed a strong alignment between human and AI scores with a Quadratic Weighted Kappa of 0.72, Pearson correlation of 0.73, and an overlap measure of 83.5 %. Analysis of rater effects showed minimal influence on score discrepancies, though factors such as experience and gender exhibited modest effects. These findings demonstrate the potential of AI-driven scoring in Turkish, offering valuable insights for broader implementation in under-represented languages, such as the possible source of disagreements between human and AI scores. Conclusions from a specific writing task with a single human rater underscore the need for future research to explore diverse inputs and multiple raters.

Original languageEnglish
Article number103784
JournalSystem
Volume133
Number of pages12
ISSN0346-251X
DOIs
Publication statusPublished - 10.2025

Bibliographical note

Publisher Copyright:
© 2025 The Authors

    Research areas

  • Automated scoring, Large language models, Multilevel models, Rater reliability, Turkish essays, Zero-shot with rubric
  • Educational science

Recently viewed

Publications

  1. Integration durch soziale Kontrolle?
  2. Intraindividual variability in identity centrality
  3. Geometric structures using model predictive control for an electromagnetic actuator
  4. Relationships between language-related variations in text tasks, reading comprehension, and students’ motivation and emotions: A systematic review
  5. Petri net based EMIS-mappers for flexible manufacturing systems
  6. Guest Editors' Introduction
  7. Finite element modeling of laser beam welding for residual stress calculation
  8. Introduction to ‘Exploring the frontiers: unveiling new horizons in carbon efficient biomass utilization’
  9. Media coverage of discourse on adaptation
  10. Reliability and Validity of Assessing User Satisfaction With Web-Based Health Interventions
  11. Developing a Complex Portrait of Content Teaching for Multilingual Learners via Nonlinear Theoretical Understandings
  12. Challenges for postdocs in Germany and beyond:
  13. Supporting Visual and Verbal Learning Preferences in a Second-Language Multimedia Learning Environment
  14. Writing as a Deeper Form of Concentration
  15. Multilingual disambiguation of named entities using linked data
  16. A data-driven methodological routine to identify key indicators for social-ecological system archetype mapping
  17. Portuguese part-of-speech tagging with large margin structure learning
  18. AUC Maximizing Support Vector Learning
  19. Analysis of the construction of an autonomous robot to improve its energy efficiency when traveling through irregular terrain
  20. Artificial intelligence in songwriting and composing - perspectives and challenges in creative practices
  21. How to support teachers to give feedback to modelling tasks effectively? Results from a teacher-training-study in the Co²CA project
  22. Sliding Mode Control of an Inductive Power Transmission System with Maximum Efficiency
  23. Robustness of coherent sets computations
  24. Deeper Insights into Different Consumer Perceptions of CSR Communication
  25. Legitimation problems of participatory processes in technology assessment and technology policy
  26. Collaborative open science as a way to reproducibility and new insights in primate cognition research
  27. Joint Proceedings of Scholarly QALD 2023 and SemREC 2023 co-located with 22nd International Semantic Web Conference ISWC 2023
  28. Expectations on Hierarchical Scales of Discourse
  29. Effect of yttrium addition on lattice parameter, Young's modulus and vacancy of magnesium