Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI

Angelie Kraft; Eloïse Soulier

doi:10.1145/3630106.3658981

Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Authors

Angelie Kraft
Eloïse Soulier

Professorship for Information Systems, in particular Artificial Intelligence and Explainability

The factual inaccuracies ("hallucinations") of large language models have recently inspired more research on knowledge-enhanced language modeling approaches. These are often assumed to enhance the overall trustworthiness and objectivity of language models. Meanwhile, the issue of bias is usually only mentioned as a limitation of statistical representations. This dissociation of knowledge-enhancement and bias is in line with previous research on AI engineers' assumptions about knowledge, which indicate that knowledge is commonly understood as objective and value-neutral by this community. We argue that claims and practices by actors of the field still reflect this underlying conception of knowledge. We contrast this assumption with literature from social and, in particular, feminist epistemology, which argues that the idea of a universal disembodied knower is blind to the reality of knowledge practices and seriously challenges claims of "objective"or "neutral"knowledge. Knowledge enhancement techniques commonly use Wikidata and Wikipedia as their sources for knowledge, due to their large scales, public accessibility, and assumed trustworthiness. In this work, they serve as a case study for the influence of the social setting and the identity of knowers on epistemic processes. Indeed, the communities behind Wikidata and Wikipedia are known to be male-dominated and many instances of hostile behavior have been reported in the past decade. In effect, the contents of these knowledge bases are highly biased. It is therefore doubtful that these knowledge bases would contribute to bias reduction. In fact, our empirical evaluations of RoBERTa, KEPLER, and CoLAKE, demonstrate that knowledge enhancement may not live up to the hopes of increased objectivity. In our study, the average probability for stereotypical associations was preserved on two out of three metrics and performance-related gender gaps on knowledge-driven task were also preserved. We build on these results and critical literature to argue that the label of "knowledge"and the commonly held beliefs about it can obscure the harm that is still done to marginalized groups. Knowledge enhancement is at risk of perpetuating epistemic injustice, and AI engineers' understanding of knowledge as objective per se conceals this injustice. Finally, to get closer to trustworthy language models, we need to rethink knowledge in AI and aim for an agenda of diversification and scrutiny from outgroup members.

Original language	English
Title of host publication	2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
Number of pages	13
Publisher	Association for Computing Machinery, Inc
Publication date	03.06.2024
Pages	1433-1445
ISBN (print)	9798400704505
ISBN (electronic)	979-8-4007-0450-5
DOIs	https://doi.org/10.1145/3630106.3658981
Publication status	Published - 03.06.2024
Event	ACM Conference on Fairness, Accountability, and Transparency - FAccT 2024 - Rio de Janeiro, Brazil Duration: 03.06.2024 → 06.06.2024 https://facctconference.org/2024/

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

Research areas

bias, epistemology, fairness, feminism, knowledge enhancement, knowledge graphs, language models, natural language processing, representation
Informatics

Sustainable Development Goals

SDG 5 - Gender Equality

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (ed.). Springer Science and Business Media Deutschland, p. 95-110 16 p. (Lecture Notes in Computer Science; vol. 15836 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Junior fellows and distinguished dissertation of the GI and AI for crisis

Usbeck, R., Kraft, A. & Westphal, P., 01.02.2025, In: IT - Information Technology. 67, 1, p. 1-2 2 p.

Research output: Journal contributions › Other (editorial matter etc.) › Research

TextGraphs 2024 Shared Task on Text-Graph Representations for Knowledge Graph Question Answering

Sakhovskiy, A., Salnikov, M., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Ustalov, D., Tutubalina, E., Usbeck, R. & Panchenko, A., 01.08.2024, Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing: Graph-Based Methods for Natural Language Processing, 62nd Annual Meeting of the Association of Computational Linguistics. Ustalov, D., Gao, Y., Pachenko, A., Tutubalina, E., Nikishina, I., Ramesh, A., Sakhovskiy, A., Usbeck, R., Penn, G. & Valentino, M. (eds.). Kerrville: Association for Computational Linguistics (ACL), p. 116-125 10 p.

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Community and Training in NFDI4DS

Lorenz, A. L., Christoforaki, M., Hennig, C., Kraft, A., Maltzan, S. V. & Schimmler, S., 2023, INFORMATIK 2023: Designing Futures: Zukunfte gestalten, Proceedings; 2 6. – 29. September 2023, Berlin . Klein, M., Krupka, D., Winter, C. & Wohlgemuth, V. (eds.). Bonn: Gesellschaft für Informatik e.V., p. 905-908 4 p. (Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI); vol. P-337).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

NFDI4DS Infrastructure and Services

Schimmler, S., Wentzel, B., Bleier, A., Dietze, S., Karmakar, S., Mutschke, P., Kraft, A., Taffa, T. A., Usbeck, R., Boukhers, Z., Auer, S., Castro, L. J., Ackermann, M. R., Neumuth, T., Schneider, D., Abedjan, Z., Latif, A., Limani, F., Ahmad, R. A., Rehm, G., Khorasani, S. A. & Lieber, M., 04.12.2023, INFORMATIK 2023 - Designing Futures: Zukunfte gestalten, Proceedings. Klein, M., Krupka, D., Winter, C. & Wohlgemuth, V. (eds.). Gesellschaft für Informatik e.V., p. 921-926 6 p. (Lecture Notes in Informatics (LNI), Proceedings; vol. P-337).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

DOI

https://doi.org/10.1145/3630106.3658981
Final published version

Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI

Authors

Bibliographical note

Research areas

Sustainable Development Goals

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Junior fellows and distinguished dissertation of the GI and AI for crisis

TextGraphs 2024 Shared Task on Text-Graph Representations for Knowledge Graph Question Answering

Community and Training in NFDI4DS

NFDI4DS Infrastructure and Services

DOI

Recently viewed

Activities

Publications