Exploring the Use of the Pronoun I in German Academic Texts with Machine Learning

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

The use of the pronoun ich (‘I’) in academic language is a source of constant debate and a frequent cause of insecurity for students. We explore manually annotated instances of I from a German learner corpus. Using machine learning techniques, we investigate to what extent it is possible to automatically distinguish between different types of I usage (author I vs. narrator I). We additionally inspect which context words are good indicators of one type or the other. The results show that an automatic classification is not straightforward, but the distinctive features are in line with previous research. The results of the automatic classification are not perfect, but would greatly facilitate manual annotation. The distinctive words are in line with previous research and indicate that the author I is a more homogeneous class.
Translated title of the contributionErforschung der Verwendung des Pronomen Ich in deutschen akademischen Texten mit maschinellem Lernen
Original languageEnglish
Title of host publicationInformatik 2020 - Back to the future : 50. Jahrestagung der Gesellschaft für Informatik vom 28. September - 2. Oktober 2020, virtual
EditorsRalf H. Reussner, Anne Koziolek, Robert Heinrich
Number of pages7
Place of PublicationBonn
PublisherGesellschaft für Informatik e.V.
Publication date2020
Pages1327-1333
ISBN (electronic)978-3-88579-701-2
DOIs
Publication statusPublished - 2020
Event50th Annual Conference of the German Informatics Society - INFORMATIK 2020: Back to the Future - Online, Karlsruhe, Germany
Duration: 28.09.202002.10.2020
Conference number: 50
https://informatik2020.gi.de/

Bibliographical note

Funding Information:
Melanie Andresen’s work on this paper was funded by the Landesforschungsförderung Hamburg in the context of the project hermA [Ga17] (LFF-FV 35) at Universitčt Hamburg.

Publisher Copyright:
© 2020 Gesellschaft fur Informatik (GI). All rights reserved.

    Research areas

  • Language Studies - annotation, Academic language, German, machine learning, classification
  • Academic language, Annotation, Classification, German, Machine learning

DOI

Recently viewed

Publications

  1. Information rigidities, inflation perceptions, and the media
  2. Inexistent Ink
  3. The pace of life for forest trees
  4. "Helden des Alltags"
  5. The hidden hand that shapes conceptual understanding: Choosing effective representations for teaching cell division and climate change
  6. Implementation of a balanced scorecard for hybrid business models
  7. Mapping and analysing historical indicators of ecosystem services in Germany
  8. Entrepreneurial actions
  9. Life-protecting neoliberalism
  10. Who likes to learn new things: measuring adult motivation to learn with PIAAC data from 21 countries
  11. Moving Around Myanmar
  12. The recent double paradigm shift in restoration ecology
  13. Moderators of intergroup evaluation in disadvantaged groups
  14. The bidirectional relationship between ESG performance and earnings management
  15. Computational Study of Three-Dimensional Lagrangian Transport and Mixing in a Stirred Tank Reactor  
  16. Insights into the accuracy of social scientists’ forecasts of societal change
  17. Legacy effects of land-use modulate tree growth responses to climate extremes
  18. Conception and analysis of Cascaded Dual Kalman Filters as virtual sensors for mastication activity of stomatognathic craniomandibular system
  19. Influence of Long-Lasting Static Stretching on Maximal Strength, Muscle Thickness and Flexibility
  20. The impact of digital innovation on path-dependent decision-making
  21. Misconceptions of Measurement Equivalence
  22. The Enduring Ephemeral, or the Future is a Memory.
  23. Corrosion behavior and microstructure of a broad range of Mg-Sn-X alloys
  24. The Impact of TV Ads on the Individual User's Purchasing Behavior
  25. Developments in Qualitative Mindfulness Practice Research
  26. Measuring at all scales: sourcing data for more flexible restoration references
  27. Cognitive load theory
  28. Way out of the Supply Crises through Risk Minimization - Metrological Comparison of two Polypropylene Materials and Examination with Six Sigma Methods
  29. Problems in Mathematizing Systems Biology
  30. To separate or not to separate: what is necessary and enough for a green and sustainable extraction of bioactive compounds from Brazilian citrus waste
  31. Exploring the Capacity of Water Framework Directive Indices to Assess Ecosystem Services in Fluvial and Riparian Systems
  32. Introduction
  33. The Lotka-Volterra Model for Competition Controlled by a Sliding Mode Approach