Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Standard

Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World. / de Souza Taso, Fernanda Tiemi; Dos Reis, Valéria Quadros; Martinez, Fábio Viduani.
in: Journal on Interactive Systems, Jahrgang 16, Nr. 1, 01.01.2025, S. 532-543.

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Harvard

APA

Vancouver

Bibtex

@article{2be9dc45f7fe4bc680ef2fc95aa22471,
title = "Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World",
abstract = "AIn this paper we meticulously examined a Word Embedding model in Portuguese, endeavoring to identify gender biases through diverse analytical perspectives, employing SC-WEAT and RIPA metrics that is widely used in the English realm. Our inquiry focused on three primary dimensions: (1) the frequency-based association of words with feminine and masculine terms; (2) the identification of disparities between grammatical classes pertaining to gender sets; and (3) the categorisation and grouping of feminine and masculine words, including their distinctive attributes. In regard to frequency groups, our investigation revealed a pervasive negative association of words with feminine terms in most subsets, indicative of a pronounced inclination of the model{\textquoteright}s vocabulary towards the masculine references. Notably, among the 100 most frequent words, 89 exhibited a stronger association with masculine terms. In the scrutiny of grammatical classes, our analysis demonstrated a predominant association of adjectives with feminine references, underscoring the imperative for supplementary description when referring to women. Furthermore, a conspicuous prevalence of participle verbs associated with feminine terms was observed, a phenomenon distinct from their male counterparts and one that requires further expert attention to be properly explained. The categorisation process underscored the existence of gender bias, as exemplified by the association of words with masculine terms within the domains of sport, finance, and science, while words related to feelings, home furniture, and entertainment were associated with feminine terms. These findings assume significance in fostering a discourse on gender analysis within non-English models, such as Portuguese models, thereby encouraging the Brazilian community to actively investigate biases in NLP models.",
keywords = "Algorithmic Sexism, Computational Linguistics, Ethics in AI, Natural Language Processing, Non-English NLP, Business informatics",
author = "{de Souza Taso}, {Fernanda Tiemi} and {Dos Reis}, {Val{\'e}ria Quadros} and Martinez, {F{\'a}bio Viduani}",
note = "Publisher Copyright: {\textcopyright} This work is licensed under a Creative Commons Attribution 4.0 International License.",
year = "2025",
month = jan,
day = "1",
doi = "10.5753/jis.2025.5958",
language = "English",
volume = "16",
pages = "532--543",
journal = "Journal on Interactive Systems",
issn = "2763-7719",
publisher = "Sociedade Brasileira de Computa{\c c}{\~a}o (SBC)",
number = "1",

}

RIS

TY - JOUR

T1 - Analyzing Discourses in Portuguese Word Embeddings

T2 - A Case of Gender Bias Outside the English-Speaking World

AU - de Souza Taso, Fernanda Tiemi

AU - Dos Reis, Valéria Quadros

AU - Martinez, Fábio Viduani

N1 - Publisher Copyright: © This work is licensed under a Creative Commons Attribution 4.0 International License.

PY - 2025/1/1

Y1 - 2025/1/1

N2 - AIn this paper we meticulously examined a Word Embedding model in Portuguese, endeavoring to identify gender biases through diverse analytical perspectives, employing SC-WEAT and RIPA metrics that is widely used in the English realm. Our inquiry focused on three primary dimensions: (1) the frequency-based association of words with feminine and masculine terms; (2) the identification of disparities between grammatical classes pertaining to gender sets; and (3) the categorisation and grouping of feminine and masculine words, including their distinctive attributes. In regard to frequency groups, our investigation revealed a pervasive negative association of words with feminine terms in most subsets, indicative of a pronounced inclination of the model’s vocabulary towards the masculine references. Notably, among the 100 most frequent words, 89 exhibited a stronger association with masculine terms. In the scrutiny of grammatical classes, our analysis demonstrated a predominant association of adjectives with feminine references, underscoring the imperative for supplementary description when referring to women. Furthermore, a conspicuous prevalence of participle verbs associated with feminine terms was observed, a phenomenon distinct from their male counterparts and one that requires further expert attention to be properly explained. The categorisation process underscored the existence of gender bias, as exemplified by the association of words with masculine terms within the domains of sport, finance, and science, while words related to feelings, home furniture, and entertainment were associated with feminine terms. These findings assume significance in fostering a discourse on gender analysis within non-English models, such as Portuguese models, thereby encouraging the Brazilian community to actively investigate biases in NLP models.

AB - AIn this paper we meticulously examined a Word Embedding model in Portuguese, endeavoring to identify gender biases through diverse analytical perspectives, employing SC-WEAT and RIPA metrics that is widely used in the English realm. Our inquiry focused on three primary dimensions: (1) the frequency-based association of words with feminine and masculine terms; (2) the identification of disparities between grammatical classes pertaining to gender sets; and (3) the categorisation and grouping of feminine and masculine words, including their distinctive attributes. In regard to frequency groups, our investigation revealed a pervasive negative association of words with feminine terms in most subsets, indicative of a pronounced inclination of the model’s vocabulary towards the masculine references. Notably, among the 100 most frequent words, 89 exhibited a stronger association with masculine terms. In the scrutiny of grammatical classes, our analysis demonstrated a predominant association of adjectives with feminine references, underscoring the imperative for supplementary description when referring to women. Furthermore, a conspicuous prevalence of participle verbs associated with feminine terms was observed, a phenomenon distinct from their male counterparts and one that requires further expert attention to be properly explained. The categorisation process underscored the existence of gender bias, as exemplified by the association of words with masculine terms within the domains of sport, finance, and science, while words related to feelings, home furniture, and entertainment were associated with feminine terms. These findings assume significance in fostering a discourse on gender analysis within non-English models, such as Portuguese models, thereby encouraging the Brazilian community to actively investigate biases in NLP models.

KW - Algorithmic Sexism

KW - Computational Linguistics

KW - Ethics in AI

KW - Natural Language Processing

KW - Non-English NLP

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=105011476330&partnerID=8YFLogxK

U2 - 10.5753/jis.2025.5958

DO - 10.5753/jis.2025.5958

M3 - Journal articles

AN - SCOPUS:105011476330

VL - 16

SP - 532

EP - 543

JO - Journal on Interactive Systems

JF - Journal on Interactive Systems

SN - 2763-7719

IS - 1

ER -

DOI

Zuletzt angesehen