Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

The factual inaccuracies ("hallucinations") of large language models have recently inspired more research on knowledge-enhanced language modeling approaches. These are often assumed to enhance the overall trustworthiness and objectivity of language models. Meanwhile, the issue of bias is usually only mentioned as a limitation of statistical representations. This dissociation of knowledge-enhancement and bias is in line with previous research on AI engineers' assumptions about knowledge, which indicate that knowledge is commonly understood as objective and value-neutral by this community. We argue that claims and practices by actors of the field still reflect this underlying conception of knowledge. We contrast this assumption with literature from social and, in particular, feminist epistemology, which argues that the idea of a universal disembodied knower is blind to the reality of knowledge practices and seriously challenges claims of "objective"or "neutral"knowledge. Knowledge enhancement techniques commonly use Wikidata and Wikipedia as their sources for knowledge, due to their large scales, public accessibility, and assumed trustworthiness. In this work, they serve as a case study for the influence of the social setting and the identity of knowers on epistemic processes. Indeed, the communities behind Wikidata and Wikipedia are known to be male-dominated and many instances of hostile behavior have been reported in the past decade. In effect, the contents of these knowledge bases are highly biased. It is therefore doubtful that these knowledge bases would contribute to bias reduction. In fact, our empirical evaluations of RoBERTa, KEPLER, and CoLAKE, demonstrate that knowledge enhancement may not live up to the hopes of increased objectivity. In our study, the average probability for stereotypical associations was preserved on two out of three metrics and performance-related gender gaps on knowledge-driven task were also preserved. We build on these results and critical literature to argue that the label of "knowledge"and the commonly held beliefs about it can obscure the harm that is still done to marginalized groups. Knowledge enhancement is at risk of perpetuating epistemic injustice, and AI engineers' understanding of knowledge as objective per se conceals this injustice. Finally, to get closer to trustworthy language models, we need to rethink knowledge in AI and aim for an agenda of diversification and scrutiny from outgroup members.

Original languageEnglish
Title of host publication2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
Number of pages13
PublisherAssociation for Computing Machinery, Inc
Publication date03.06.2024
Pages1433-1445
ISBN (print)9798400704505
ISBN (electronic)979-8-4007-0450-5
DOIs
Publication statusPublished - 03.06.2024
EventACM Conference on Fairness, Accountability, and Transparency - FAccT 2024 - Rio de Janeiro, Brazil
Duration: 03.06.202406.06.2024
https://facctconference.org/2024/

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

    Research areas

  • bias, epistemology, fairness, feminism, knowledge enhancement, knowledge graphs, language models, natural language processing, representation
  • Informatics

DOI

Recently viewed

Activities

  1. Lagrangian aspects of turbulent superstructures: numerical analysis of long-term dynamics and transport properties
  2. Towards a fully-automated adaptive e-learning environment: A predictive model for difficulty generating factors in gap-filling activities that target English tense-aspect-mood
  3. International Conference of Computational Methods in Engineering Science - Chair of Session III
  4. Context-based discrimination in school
  5. Collaborative modeling in climatic change adaptation and energy transformation.
  6. DigiSchreib - A tool to support teachers in the selection and use of digital writing tools
  7. Understanding Learning Processes For Developing Key Competencies In Sustainability Implication For Higher Education
  8. Mapping participation in public environmental decision-making processes: An international database on published case studies
  9. Methods of boundary work for inter- and transdisciplinary research.
  10. Vortrag: Assessing and Managing Sustainable Business Models - A Status Update
  11. Sino-German Summer School on Design and data analysis of biodiversity-ecosystem functioning experiments 2011
  12. Learning to rate player actions in multi-agent scenarios
  13. Using Technology for Heterogeneous Groups in Foreign Language Teaching – Student Teachers’ Perspectives
  14. Workshop on Stochastic Models, Statistics and Their Applications 2017
  15. Conference on Transdisciplinary Research and Modeling - 2013
  16. DIY Bat Detector with Gamification Elements
  17. The semantics of transformation: conceptual work based on Freirean methodology.
  18. Field Experimentation in Governance Research. Early insights from researching the effectiveness of public participation in implementing the EU Floods Directive
  19. Digitalization and cross-border knowledge transfer: The impact on international assignments
  20. Between primary and secondary information: Gilbert Simondon and the question of complexity and control

Publications

  1. Mechanism of dynamic recrystallization and evolution of texture in the hot working domains of the processing map for Mg-4Al-2Ba-2Ca Alloy
  2. Species constancy depends on plot size - A problem for vegetation classification and how it can be solved
  3. Effectiveness of a Web-Based Cognitive Behavioural Intervention for Subthreshold Depression
  4. Using heuristic worked examples to promote solving of reality‑based tasks in mathematics in lower secondary school
  5. Mechanical performance prediction for friction riveting joints of dissimilar materials via machine learning
  6. The Replication Database: Documenting the Replicability of Psychological Science
  7. Learning and Re-learning from net- based cooperative learning discourses
  8. The structure of emotions in learning situations
  9. Reciprocal Relationships Between Dispositional Optimism and Work Experiences
  10. Public Value: rethinking value creation
  11. Predicate‐based model of problem‐solving for robotic actions planning
  12. Conceptualizing Role Development in Agile Transformations
  13. Control versus Complexity
  14. An Optimal and Stabilising PI Controller with an Anti-windup Scheme for a Purification Process of Potable Water
  15. Modeling of lateness distributions depending on the sequencing method with respect to productivity effects
  16. Machine Learning and Knowledge Discovery in Databases
  17. Sensitivity to complexity - an important prerequisite of problem solving mathematics teaching
  18. An integrative research framework for enabling transformative adaptation
  19. Performance Saga: Interview 01
  20. Study of fuzzy controllers performance
  21. Simple saturated PID control for fast transient of motion systems
  22. A Lean Convolutional Neural Network for Vehicle Classification
  23. Employing A-B tests for optimizing prices levels in e-commerce applications
  24. An intersection test for the cointegrating rank in dependent panel data
  25. “Ideation is Fine, but Execution is Key”
  26. An analytical approach to evaluating nonmonotonic functions of fuzzy numbers