The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing

Debayan Banerjee; Pranav Nair; Ricardo Usbeck; Chris Biemann

doi:10.18653/v1/2023.findings-acl.774

The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Authors

Debayan Banerjee
Pranav Nair
Ricardo Usbeck
Chris Biemann

In this work, we analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We perform experiments within the the context of knowledge graph question answering (KGQA), where the task is to convert questions in natural language to the SPARQL query language. We observe that the query vocabulary is distinct from human vocabulary. Language Models (LMs) are pre-dominantly trained for human language tasks, and hence, if the query vocabulary is replaced with a vocabulary more attuned to the LM tokenizer, the performance of models may improve. We carry out carefully selected vocabulary substitutions on the queries and find absolute gains in the range of 17% on the GrailQA dataset.

Original language	English
Title of host publication	Findings of the Association for Computational Linguistics: ACL 2023 : July 9-14, 2023
Editors	Anna Rogers, Jordan L. Boyd-Graber, Naoaki Okazaki
Number of pages	10
Place of Publication	Stroudsburg
Publisher	Association for Computational Linguistics (ACL)
Publication date	01.07.2023
Pages	12219-12228
ISBN (electronic)	978-1-959429-62-3
DOIs	https://doi.org/10.18653/v1/2023.findings-acl.774 https://doi.org/10.48550/arXiv.2305.15108
Publication status	Published - 01.07.2023
Externally published	Yes
Event	61st Annual Meeting of the Association for Computational Linguistics - Toronto, Canada Duration: 09.07.2023 → 14.07.2023 Conference number: 61 https://2023.aclweb.org

Bibliographical note

Publisher Copyright:
© 2023 Association for Computational Linguistics.

Research areas

Business informatics
Informatics

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (ed.). Springer Science and Business Media Deutschland, p. 95-110 16 p. (Lecture Notes in Computer Science; vol. 15836 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Analyzing the Influence of Knowledge Graph Information on Relation Extraction.

Möller, C. & Usbeck, R., 2025

Research output: other publications › Other › Research

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Möller, C. & Usbeck, R., 2025, The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025 Portoroz, Slovenia, June 1–5, 2025 Proceedings, Part I. Curry, E., Acosta, M., Poveda-Villalón, M., van Erp, M., Ojo, A., Hose, K., Shimizu, C. & Lisena, P. (eds.). Cham: Springer Nature Switzerland AG, Vol. 1. p. 460-480 21 p. (Lecture Notes in Computer Science ; vol. 15718).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

ASK-DBLP: Answering Questions over DBLP

Taffa, T., Neises, P., Ollinger, S., Westphal, P., Ackermann, M. R., Banerjee, D. & Usbeck, R., 02.11.2025, ISWC-C 2025, Industry, Doctoral Consortium, Posters and Demos at ISWC 2025: Joint Proceedings of Industry, Doctoral Consortium, Posters and Demos of the 24th International Semantic Web Conference (ISWC-C 2025), ISWC 2025 Companion Volume. Celino, I., Hassanzadeh, O., Bernstein, A., Noy, N., Cheng, G., Wang, S., Ferrada, S., Soulard, T., Kozaki, K., Takeda, H. & Gentile, A. L. (eds.). Aachen: Sun Site Central Europe (RWTH Aachen University), p. 435-440 6 p. D13. (CEUR Workshop Proceedings; vol. 4085).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, Linking Meaning: Semantic Technologies Shaping the Future of AI: Cover 74617 Proceedings of the 21st International Conference on Semantic Systems, 3-5 September 2025, Vienna, Austria. Spahiu, B., Vahdati, S., Salatino, A., Pellegrini, T. & Havur, G. (eds.). IOS Press BV, p. 176-193 18 p. (Studies on the Semantic Web; vol. 62).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research

DOI

https://doi.org/10.18653/v1/2023.findings-acl.774
Final published version
https://doi.org/10.48550/arXiv.2305.15108
Other version

The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing

Authors

Bibliographical note

Research areas

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Analyzing the Influence of Knowledge Graph Information on Relation Extraction.

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

ASK-DBLP: Answering Questions over DBLP

Automating SPARQL Query Translations between DBpedia and Wikidata

DOI

Recently viewed

Projects

Activities

Publications

Press / Media