Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

The factual inaccuracies ("hallucinations") of large language models have recently inspired more research on knowledge-enhanced language modeling approaches. These are often assumed to enhance the overall trustworthiness and objectivity of language models. Meanwhile, the issue of bias is usually only mentioned as a limitation of statistical representations. This dissociation of knowledge-enhancement and bias is in line with previous research on AI engineers' assumptions about knowledge, which indicate that knowledge is commonly understood as objective and value-neutral by this community. We argue that claims and practices by actors of the field still reflect this underlying conception of knowledge. We contrast this assumption with literature from social and, in particular, feminist epistemology, which argues that the idea of a universal disembodied knower is blind to the reality of knowledge practices and seriously challenges claims of "objective"or "neutral"knowledge. Knowledge enhancement techniques commonly use Wikidata and Wikipedia as their sources for knowledge, due to their large scales, public accessibility, and assumed trustworthiness. In this work, they serve as a case study for the influence of the social setting and the identity of knowers on epistemic processes. Indeed, the communities behind Wikidata and Wikipedia are known to be male-dominated and many instances of hostile behavior have been reported in the past decade. In effect, the contents of these knowledge bases are highly biased. It is therefore doubtful that these knowledge bases would contribute to bias reduction. In fact, our empirical evaluations of RoBERTa, KEPLER, and CoLAKE, demonstrate that knowledge enhancement may not live up to the hopes of increased objectivity. In our study, the average probability for stereotypical associations was preserved on two out of three metrics and performance-related gender gaps on knowledge-driven task were also preserved. We build on these results and critical literature to argue that the label of "knowledge"and the commonly held beliefs about it can obscure the harm that is still done to marginalized groups. Knowledge enhancement is at risk of perpetuating epistemic injustice, and AI engineers' understanding of knowledge as objective per se conceals this injustice. Finally, to get closer to trustworthy language models, we need to rethink knowledge in AI and aim for an agenda of diversification and scrutiny from outgroup members.

Original languageEnglish
Title of host publication2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
Number of pages13
PublisherAssociation for Computing Machinery, Inc
Publication date03.06.2024
Pages1433-1445
ISBN (print)9798400704505
ISBN (electronic)979-8-4007-0450-5
DOIs
Publication statusPublished - 03.06.2024
EventACM Conference on Fairness, Accountability, and Transparency - FAccT 2024 - Rio de Janeiro, Brazil
Duration: 03.06.202406.06.2024
https://facctconference.org/2024/

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

    Research areas

  • bias, epistemology, fairness, feminism, knowledge enhancement, knowledge graphs, language models, natural language processing, representation
  • Informatics

DOI

Recently viewed

Researchers

  1. Marcus Erben

Publications

  1. Simulation based comparison of safety-stock calculation methods
  2. Combining multiple investigative approaches to unravel functional responses to global change in the understorey of temperate forests
  3. Artificial intelligence
  4. BUSINESS MODELS IN BANKING: A CLUSTER ANALYSIS USING ARCHIVAL DATA
  5. A latent state-trait analysis of current achievement motivation across different tasks of cognitive ability
  6. Advisory systems in pluralistic knowledge societies:
  7. Pathways of Data-driven Business Model Design and Realization
  8. Fusion of knowledge bases for better navigation of wheeled mobile robotic group with 3D TVS
  9. Parameterized Synthetic Image Data Set for Fisheye Lens
  10. Shepherds’ local knowledge and scientific data on the scavenging ecosystem service
  11. Machine Learning and Data Mining for Sports Analytics
  12. Erkenntnistheorie
  13. Markups and Concentration in the Context of Digitization
  14. Editorial: Courts in Context. An Empirical Re-Evaluation of Categorization in the Asylum Regime
  15. Communication under the microscope: The theory and practice of microanalysis
  16. Probabilistic movement models and zones of control
  17. Horizontal, but not vertical canopy structure is related to stand functional diversity in a subtropical slope forest
  18. A PD Fuzzy Control of a Nonholonomic Car-Like Robot for Drive Assistant Systems
  19. Telecoupling as a framework to support a more nuanced understanding of causality in land system science
  20. Differentiating Different Types of Cognitive Load
  21. Implementation of a balanced scorecard for hybrid business models
  22. End-users’ perspective on digitalization
  23. Direct measurement of cognitive load in multimedia learning
  24. New Methods for the Analysis of Links between International Firm Activities and Firm Performance: A Practitioner’s Guide
  25. The auditor as an element of in- and external corporate governance
  26. Genetically based differentiation in growth of multiple non-native plant species along a steep environmental gradient
  27. Nitrogen uptake by grassland communities
  28. Ansparabschreibung durch Existenzgründer
  29. CALPHAD-based modeling of pressure-dependent Al, Cu and Li unary systems
  30. Beyond pandemic populism
  31. New concepts of extrusion dies to reduce the anisotropy of extruded profiles by means of additive manufacturing
  32. Observations of Microstructure-Oriented Crack Growth in a Cast Mg-Al-Ba-Ca Alloy under Tension, Compression and Fatigue
  33. Guidance for assessing interregional ecosystem service flows
  34. Adaptive Lehrerinterventionen beim mathematischen Modellieren
  35. The shadow of the family