Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

The factual inaccuracies ("hallucinations") of large language models have recently inspired more research on knowledge-enhanced language modeling approaches. These are often assumed to enhance the overall trustworthiness and objectivity of language models. Meanwhile, the issue of bias is usually only mentioned as a limitation of statistical representations. This dissociation of knowledge-enhancement and bias is in line with previous research on AI engineers' assumptions about knowledge, which indicate that knowledge is commonly understood as objective and value-neutral by this community. We argue that claims and practices by actors of the field still reflect this underlying conception of knowledge. We contrast this assumption with literature from social and, in particular, feminist epistemology, which argues that the idea of a universal disembodied knower is blind to the reality of knowledge practices and seriously challenges claims of "objective"or "neutral"knowledge. Knowledge enhancement techniques commonly use Wikidata and Wikipedia as their sources for knowledge, due to their large scales, public accessibility, and assumed trustworthiness. In this work, they serve as a case study for the influence of the social setting and the identity of knowers on epistemic processes. Indeed, the communities behind Wikidata and Wikipedia are known to be male-dominated and many instances of hostile behavior have been reported in the past decade. In effect, the contents of these knowledge bases are highly biased. It is therefore doubtful that these knowledge bases would contribute to bias reduction. In fact, our empirical evaluations of RoBERTa, KEPLER, and CoLAKE, demonstrate that knowledge enhancement may not live up to the hopes of increased objectivity. In our study, the average probability for stereotypical associations was preserved on two out of three metrics and performance-related gender gaps on knowledge-driven task were also preserved. We build on these results and critical literature to argue that the label of "knowledge"and the commonly held beliefs about it can obscure the harm that is still done to marginalized groups. Knowledge enhancement is at risk of perpetuating epistemic injustice, and AI engineers' understanding of knowledge as objective per se conceals this injustice. Finally, to get closer to trustworthy language models, we need to rethink knowledge in AI and aim for an agenda of diversification and scrutiny from outgroup members.

OriginalspracheEnglisch
Titel2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
Anzahl der Seiten13
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum03.06.2024
Seiten1433-1445
ISBN (Print)9798400704505
ISBN (elektronisch)979-8-4007-0450-5
DOIs
PublikationsstatusErschienen - 03.06.2024
VeranstaltungACM Conference on Fairness, Accountability, and Transparency - FAccT 2024 - Rio de Janeiro, Brasilien
Dauer: 03.06.202406.06.2024
https://facctconference.org/2024/

Bibliographische Notiz

Publisher Copyright:
© 2024 Owner/Author.

DOI

Zuletzt angesehen

Publikationen

  1. Graph-based Approaches for Analyzing Team Interaction on the Example of Soccer
  2. Performance of methods to select landscape metrics for modelling species richness
  3. Improve a 3D distance measurement accuracy in stereo vision systems using optimization methods’ approach
  4. IWRM through WFD implementation? Drivers for integration in polycentric water governance systems
  5. A Sensitive Microsystem as Biosensor for Cell Growth Monitoring and Antibiotic Testing
  6. Quantum Computing and the Analog/Digital Distinction
  7. Practical guide to SAP Netweaver PI-development
  8. Species constancy depends on plot size - A problem for vegetation classification and how it can be solved
  9. Acceleration of material-dominated calculations via phase-space simplicial subdivision and interpolation
  10. Chapter 9: Particular Remedies for Non-performance: Section 2: Withholding Performance
  11. On the computation of the warping function and the torsional properties of thin-walled crosssections of prismatic beams
  12. Proxy Indicators for the Quality of Open-domain Dialogues
  13. Creating regional (e-)learning networks
  14. Perfectly nested or significantly nested - an important difference for conservation management
  15. An analytical approach to evaluating monotonic functions of fuzzy numbers
  16. Masked Autoencoder Pretraining for Event Classification in Elite Soccer
  17. Perception and Inference
  18. Action Errors, Error Management, and Learning in Organizations
  19. Differences in adjustment flexibility between regular and temporary agency work
  20. Modeling and simulation of size effects in metallic glasses with non-local continuum mechanics theory
  21. Assessing authenticity in modelling test items: deriving a theoretical model
  22. Twitter and its usage for dialogic stakeholder communication by MNCs and NGOs
  23. Doing space in face-to-face interaction and on interactive multimodal platforms
  24. Using Digitalization As An Enabler For Changeability In Production Systems In A Learning Factory Environment
  25. A PD regulator to minimize noise effect using a minimal variance method for soft landing control of an electromagnetic valve actuator
  26. The dynamics of prior entry in serial visual processing
  27. Differentiating forest types using TerraSAR–X spotlight images based on inferential statistics and multivariate analysis
  28. On the Appropriate Methodologies for Data Science Projects
  29. Grounds different from, though equally solid with

Presse / Medien

  1. Rio+20