Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

The factual inaccuracies ("hallucinations") of large language models have recently inspired more research on knowledge-enhanced language modeling approaches. These are often assumed to enhance the overall trustworthiness and objectivity of language models. Meanwhile, the issue of bias is usually only mentioned as a limitation of statistical representations. This dissociation of knowledge-enhancement and bias is in line with previous research on AI engineers' assumptions about knowledge, which indicate that knowledge is commonly understood as objective and value-neutral by this community. We argue that claims and practices by actors of the field still reflect this underlying conception of knowledge. We contrast this assumption with literature from social and, in particular, feminist epistemology, which argues that the idea of a universal disembodied knower is blind to the reality of knowledge practices and seriously challenges claims of "objective"or "neutral"knowledge. Knowledge enhancement techniques commonly use Wikidata and Wikipedia as their sources for knowledge, due to their large scales, public accessibility, and assumed trustworthiness. In this work, they serve as a case study for the influence of the social setting and the identity of knowers on epistemic processes. Indeed, the communities behind Wikidata and Wikipedia are known to be male-dominated and many instances of hostile behavior have been reported in the past decade. In effect, the contents of these knowledge bases are highly biased. It is therefore doubtful that these knowledge bases would contribute to bias reduction. In fact, our empirical evaluations of RoBERTa, KEPLER, and CoLAKE, demonstrate that knowledge enhancement may not live up to the hopes of increased objectivity. In our study, the average probability for stereotypical associations was preserved on two out of three metrics and performance-related gender gaps on knowledge-driven task were also preserved. We build on these results and critical literature to argue that the label of "knowledge"and the commonly held beliefs about it can obscure the harm that is still done to marginalized groups. Knowledge enhancement is at risk of perpetuating epistemic injustice, and AI engineers' understanding of knowledge as objective per se conceals this injustice. Finally, to get closer to trustworthy language models, we need to rethink knowledge in AI and aim for an agenda of diversification and scrutiny from outgroup members.

OriginalspracheEnglisch
Titel2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
Anzahl der Seiten13
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum03.06.2024
Seiten1433-1445
ISBN (Print)9798400704505
ISBN (elektronisch)979-8-4007-0450-5
DOIs
PublikationsstatusErschienen - 03.06.2024
VeranstaltungACM Conference on Fairness, Accountability, and Transparency - FAccT 2024 - Rio de Janeiro, Brasilien
Dauer: 03.06.202406.06.2024
https://facctconference.org/2024/

Bibliographische Notiz

Publisher Copyright:
© 2024 Owner/Author.

DOI

Zuletzt angesehen

Publikationen

  1. Multi-view discriminative sequential learning
  2. Interactive sequential generative models for team sports
  3. lp-Norm Multiple Kernel Learning
  4. An Interactive Layers Model of Self-Regulated Learning and Cognitive Load
  5. Competence models for assessing individual learning outcomes and evaluating educational processes - a priority program of the German research foundation (DFG)
  6. A matter of connection
  7. What´s in a net? or: The end of the average
  8. Design of an Information-Based Distributed Production Planning System
  9. Thanking and responding to thanks in American English: Language patterning and contextual appropriateness
  10. Understanding Innovation
  11. Supporting Visual and Verbal Learning Preferences in a Second-Language Multimedia Learning Environment
  12. Diversity Management and Corporate Change: Implications for Co-Determination
  13. Internet research differs from research on internet users
  14. Nonlinear anisotropic boundary value problems – regularity results and multiscale discretizations
  15. Landscape models for use in studies of landscape change and habitat fragmentation
  16. A four-component classification of uncertainties in biological invasions: implications for management
  17. Accuracy Improvement by Artificial Neural Networks in Technical Vision System
  18. Learner pragmatics at the discourse level: Staying “on topic” in a telecollaborative eTandem task
  19. Dealing with inclusion–teachers’ assessment of internal and external resources
  20. Long-term population dynamics of Dactylorhiza incarnata (L.) Soo after abandonment and re-introduction of mowing
  21. TANGO: A reliable, open-source, browser-based task to assess individual differences in gaze understanding in 3 to 5-year-old children and adults
  22. Comment on “Stretching intervention can prevent muscle injuries: a systematic review and meta-analysis”
  23. When, Where, and How Nature Matters for Ecosystem Services
  24. Why EU asylum standards exceed the lowest common denominator
  25. Structural Adaptation Triggers in the CAP
  26. An Ecosystem Architecture Meta-Model for Supporting Ultra-Large Scale Digital Transformations
  27. Requests for reasoning in geometrical textbook tasks for primary-level students
  28. Leveraging the macro-level environment to balance work and life
  29. Effect of erbium modification on the microstructure, mechanical and corrosion characteristics of binary Mg-Al alloys
  30. Scattered trees are keystone structures - Implications for conservation
  31. No Concept of form within Sight Can System Theory help us?
  32. Assessing the Bonding Interface Characteristics and Mechanical Properties of Bobbin Tool Friction Stir Welded Dissimilar Aluminum Alloy Joints