Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI. / Kraft, Angelie; Soulier, Eloïse.
2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024. Association for Computing Machinery, Inc, 2024. p. 1433-1445 (2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Kraft, A & Soulier, E 2024, Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI. in 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024. 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024, Association for Computing Machinery, Inc, pp. 1433-1445, ACM Conference on Fairness, Accountability, and Transparency - FAccT 2024, Rio de Janeiro, Brazil, 03.06.24. https://doi.org/10.1145/3630106.3658981

APA

Kraft, A., & Soulier, E. (2024). Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI. In 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024 (pp. 1433-1445). (2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024). Association for Computing Machinery, Inc. https://doi.org/10.1145/3630106.3658981

Vancouver

Kraft A, Soulier E. Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI. In 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024. Association for Computing Machinery, Inc. 2024. p. 1433-1445. (2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024). doi: 10.1145/3630106.3658981

Bibtex

@inbook{c47f262e4a5147ff9eb2c0aef1bca5d3,
title = "Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI",
abstract = "The factual inaccuracies ({"}hallucinations{"}) of large language models have recently inspired more research on knowledge-enhanced language modeling approaches. These are often assumed to enhance the overall trustworthiness and objectivity of language models. Meanwhile, the issue of bias is usually only mentioned as a limitation of statistical representations. This dissociation of knowledge-enhancement and bias is in line with previous research on AI engineers' assumptions about knowledge, which indicate that knowledge is commonly understood as objective and value-neutral by this community. We argue that claims and practices by actors of the field still reflect this underlying conception of knowledge. We contrast this assumption with literature from social and, in particular, feminist epistemology, which argues that the idea of a universal disembodied knower is blind to the reality of knowledge practices and seriously challenges claims of {"}objective{"}or {"}neutral{"}knowledge. Knowledge enhancement techniques commonly use Wikidata and Wikipedia as their sources for knowledge, due to their large scales, public accessibility, and assumed trustworthiness. In this work, they serve as a case study for the influence of the social setting and the identity of knowers on epistemic processes. Indeed, the communities behind Wikidata and Wikipedia are known to be male-dominated and many instances of hostile behavior have been reported in the past decade. In effect, the contents of these knowledge bases are highly biased. It is therefore doubtful that these knowledge bases would contribute to bias reduction. In fact, our empirical evaluations of RoBERTa, KEPLER, and CoLAKE, demonstrate that knowledge enhancement may not live up to the hopes of increased objectivity. In our study, the average probability for stereotypical associations was preserved on two out of three metrics and performance-related gender gaps on knowledge-driven task were also preserved. We build on these results and critical literature to argue that the label of {"}knowledge{"}and the commonly held beliefs about it can obscure the harm that is still done to marginalized groups. Knowledge enhancement is at risk of perpetuating epistemic injustice, and AI engineers' understanding of knowledge as objective per se conceals this injustice. Finally, to get closer to trustworthy language models, we need to rethink knowledge in AI and aim for an agenda of diversification and scrutiny from outgroup members.",
keywords = "bias, epistemology, fairness, feminism, knowledge enhancement, knowledge graphs, language models, natural language processing, representation, Informatics",
author = "Angelie Kraft and Elo{\"i}se Soulier",
note = "Publisher Copyright: {\textcopyright} 2024 Owner/Author.; ACM Conference on Fairness, Accountability, and Transparency - FAccT 2024, FAccT 2024 ; Conference date: 03-06-2024 Through 06-06-2024",
year = "2024",
month = jun,
day = "3",
doi = "10.1145/3630106.3658981",
language = "English",
isbn = "9798400704505",
series = "2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024",
publisher = "Association for Computing Machinery, Inc",
pages = "1433--1445",
booktitle = "2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024",
address = "United States",
url = "https://facctconference.org/2024/",

}

RIS

TY - CHAP

T1 - Knowledge-Enhanced Language Models Are Not Bias-Proof

T2 - ACM Conference on Fairness, Accountability, and Transparency - FAccT 2024

AU - Kraft, Angelie

AU - Soulier, Eloïse

N1 - Publisher Copyright: © 2024 Owner/Author.

PY - 2024/6/3

Y1 - 2024/6/3

N2 - The factual inaccuracies ("hallucinations") of large language models have recently inspired more research on knowledge-enhanced language modeling approaches. These are often assumed to enhance the overall trustworthiness and objectivity of language models. Meanwhile, the issue of bias is usually only mentioned as a limitation of statistical representations. This dissociation of knowledge-enhancement and bias is in line with previous research on AI engineers' assumptions about knowledge, which indicate that knowledge is commonly understood as objective and value-neutral by this community. We argue that claims and practices by actors of the field still reflect this underlying conception of knowledge. We contrast this assumption with literature from social and, in particular, feminist epistemology, which argues that the idea of a universal disembodied knower is blind to the reality of knowledge practices and seriously challenges claims of "objective"or "neutral"knowledge. Knowledge enhancement techniques commonly use Wikidata and Wikipedia as their sources for knowledge, due to their large scales, public accessibility, and assumed trustworthiness. In this work, they serve as a case study for the influence of the social setting and the identity of knowers on epistemic processes. Indeed, the communities behind Wikidata and Wikipedia are known to be male-dominated and many instances of hostile behavior have been reported in the past decade. In effect, the contents of these knowledge bases are highly biased. It is therefore doubtful that these knowledge bases would contribute to bias reduction. In fact, our empirical evaluations of RoBERTa, KEPLER, and CoLAKE, demonstrate that knowledge enhancement may not live up to the hopes of increased objectivity. In our study, the average probability for stereotypical associations was preserved on two out of three metrics and performance-related gender gaps on knowledge-driven task were also preserved. We build on these results and critical literature to argue that the label of "knowledge"and the commonly held beliefs about it can obscure the harm that is still done to marginalized groups. Knowledge enhancement is at risk of perpetuating epistemic injustice, and AI engineers' understanding of knowledge as objective per se conceals this injustice. Finally, to get closer to trustworthy language models, we need to rethink knowledge in AI and aim for an agenda of diversification and scrutiny from outgroup members.

AB - The factual inaccuracies ("hallucinations") of large language models have recently inspired more research on knowledge-enhanced language modeling approaches. These are often assumed to enhance the overall trustworthiness and objectivity of language models. Meanwhile, the issue of bias is usually only mentioned as a limitation of statistical representations. This dissociation of knowledge-enhancement and bias is in line with previous research on AI engineers' assumptions about knowledge, which indicate that knowledge is commonly understood as objective and value-neutral by this community. We argue that claims and practices by actors of the field still reflect this underlying conception of knowledge. We contrast this assumption with literature from social and, in particular, feminist epistemology, which argues that the idea of a universal disembodied knower is blind to the reality of knowledge practices and seriously challenges claims of "objective"or "neutral"knowledge. Knowledge enhancement techniques commonly use Wikidata and Wikipedia as their sources for knowledge, due to their large scales, public accessibility, and assumed trustworthiness. In this work, they serve as a case study for the influence of the social setting and the identity of knowers on epistemic processes. Indeed, the communities behind Wikidata and Wikipedia are known to be male-dominated and many instances of hostile behavior have been reported in the past decade. In effect, the contents of these knowledge bases are highly biased. It is therefore doubtful that these knowledge bases would contribute to bias reduction. In fact, our empirical evaluations of RoBERTa, KEPLER, and CoLAKE, demonstrate that knowledge enhancement may not live up to the hopes of increased objectivity. In our study, the average probability for stereotypical associations was preserved on two out of three metrics and performance-related gender gaps on knowledge-driven task were also preserved. We build on these results and critical literature to argue that the label of "knowledge"and the commonly held beliefs about it can obscure the harm that is still done to marginalized groups. Knowledge enhancement is at risk of perpetuating epistemic injustice, and AI engineers' understanding of knowledge as objective per se conceals this injustice. Finally, to get closer to trustworthy language models, we need to rethink knowledge in AI and aim for an agenda of diversification and scrutiny from outgroup members.

KW - bias

KW - epistemology

KW - fairness

KW - feminism

KW - knowledge enhancement

KW - knowledge graphs

KW - language models

KW - natural language processing

KW - representation

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=85196640886&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/4139faa8-8124-30a9-9153-a28a00bcf95b/

U2 - 10.1145/3630106.3658981

DO - 10.1145/3630106.3658981

M3 - Article in conference proceedings

AN - SCOPUS:85196640886

SN - 9798400704505

T3 - 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024

SP - 1433

EP - 1445

BT - 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024

PB - Association for Computing Machinery, Inc

Y2 - 3 June 2024 through 6 June 2024

ER -

DOI

Recently viewed

Publications

  1. Obstacle Coordinates Transformation from TVS Body-Frame to AGV Navigation-Frame
  2. Impulsive Feedback Linearization for Decoupling of a Constant Disturbance with Low Relative Degree to Control Maglev Systems
  3. A Sliding Mode Control with a Bang-Bang Observer for Detection of Particle Pollution
  4. Global Finite-Time Stabilization of Planar Linear Systems With Actuator Saturation
  5. A Lyapunov based PI controller with an anti-windup scheme for a purification process of potable water
  6. The Impact of AGVs and Priority Rules in a Real Production Setup – A Simulation Study
  7. Performance of process-based models for simulation of grain N in crop rotations across Europe
  8. A Control of an Electromagnetic Actuator Using Model Predictive Control
  9. Passive Rotation Compensation in Parallel Kinematics Using Quaternions
  10. Educational reconstruction as model for the theory-based design of student-centered learning environments in electrical engineering courses
  11. An isomorphism between polynomial eigenfunctions of the transfer operator and the Eichler cohomology for modular groups
  12. A geometric approach for the design and control of an electromagnetic actuator to optimize its dynamic performance
  13. Machine vision system errors for unmanned aerial vehicle navigation
  14. Modernizing persistence–bioaccumulation–toxicity (PBT) assessment with high throughput animal-free methods
  15. Factor structure and measurement invariance of the Students’ Self-report Checklist of Social and Learning Behaviour (SSL)
  16. A Structure and Content Prompt-based Method for Knowledge Graph Question Answering over Scholarly Data
  17. Simple relay non-linear PD control for faster and high-precision motion systems with friction
  18. Controlling a Bank Model Economy by Using an Adaptive Model Predictive Control with Help of an Extended Kalman Filter
  19. Reading Comprehension as Embodied Action: Exploratory Findings on Nonlinear Eye Movement Dynamics and Comprehension of Scientific Texts
  20. WHICH ESTIMATION SITUATIONS ARE RELEVANT FOR A VALID ASSESSMENT OF MEASUREMENT ESTIMATION SKILLS
  21. Individual Scans Fusion in Virtual Knowledge Base for Navigation of Mobile Robotic Group with 3D TVS
  22. DISKNET – A Platform for the Systematic Accumulation of Knowledge in IS Research
  23. Image compression based on periodic principal components
  24. On the computation of the warping function and the torsional properties of thin-walled crosssections of prismatic beams
  25. Within-individual leaf trait variation increases with phenotypic integration in a subtropical tree diversity experiment
  26. Proxy Indicators for the Quality of Open-domain Dialogues
  27. A Comparative Study for Fisheye Image Classification
  28. Functional Richness and Relative Resilience of Bird Communities in Regions with Different Land Use Intensities
  29. Pressure fault recognition and compensation with an adaptive feedforward regulator in a controlled hybrid actuator within engine applications
  30. Masked Autoencoder Pretraining for Event Classification in Elite Soccer
  31. A longitudinal multilevel CFA-MTMM model for interchangeable and structurally different methods
  32. Input-Output Linearization of a Thermoelectric Cooler for an Ice Clamping System Using a Dual Extended Kalman Filter
  33. Control oriented modeling of DCDC converters
  34. Analytic reproducibility in articles receiving open data badges at the journal Psychological Science
  35. Linear free vibrations with uncertain initial conditions
  36. Material flow analysis between dynamic modelling and life cycle assessment
  37. An Equation with many Variables