The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing

Debayan Banerjee; Pranav Nair; Ricardo Usbeck; Chris Biemann

doi:10.18653/v1/2023.findings-acl.774

The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

Authors

Debayan Banerjee
Pranav Nair
Ricardo Usbeck
Chris Biemann

In this work, we analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We perform experiments within the the context of knowledge graph question answering (KGQA), where the task is to convert questions in natural language to the SPARQL query language. We observe that the query vocabulary is distinct from human vocabulary. Language Models (LMs) are pre-dominantly trained for human language tasks, and hence, if the query vocabulary is replaced with a vocabulary more attuned to the LM tokenizer, the performance of models may improve. We carry out carefully selected vocabulary substitutions on the queries and find absolute gains in the range of 17% on the GrailQA dataset.

Originalsprache	Englisch
Titel	Findings of the Association for Computational Linguistics: ACL 2023 : July 9-14, 2023
Herausgeber	Anna Rogers, Jordan L. Boyd-Graber, Naoaki Okazaki
Anzahl der Seiten	10
Erscheinungsort	Stroudsburg
Verlag	Association for Computational Linguistics (ACL)
Erscheinungsdatum	01.07.2023
Seiten	12219-12228
ISBN (elektronisch)	978-1-959429-62-3
DOIs	https://doi.org/10.18653/v1/2023.findings-acl.774 https://doi.org/10.48550/arXiv.2305.15108
Publikationsstatus	Erschienen - 01.07.2023
Extern publiziert	Ja
Veranstaltung	The 61st Annual Meeting of the Association for Computational Linguistics - Toronto, Kanada Dauer: 09.07.2023 → 14.07.2023 Konferenznummer: 61 https://2023.aclweb.org https://2023.aclweb.org/ https://dblp.org/streams/conf/acl#2023 http://www.wikidata.org/entity/Q119855443

Bibliographische Notiz

Publisher Copyright:
© 2023 Association for Computational Linguistics.

Fachgebiete

Wirtschaftsinformatik
Informatik

Weitere Publikationen dieser Person(en)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (Hrsg.). Springer Science and Business Media Deutschland, S. 95-110 16 S. (Lecture Notes in Computer Science; Band 15836 LNCS).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Möller, C. & Usbeck, R., 2025, The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025 Portoroz, Slovenia, June 1–5, 2025 Proceedings, Part I. Curry, E., Acosta, M., Poveda-Villalón, M., van Erp, M., Ojo, A., Hose, K., Shimizu, C. & Lisena, P. (Hrsg.). Cham: Springer Nature Switzerland AG, Band 1. S. 460-480 21 S. (Lecture Notes in Computer Science ; Band 15718).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

ASK-DBLP: Answering Questions over DBLP

Taffa, T., Neises, P., Ollinger, S., Westphal, P., Ackermann, M. R., Banerjee, D. & Usbeck, R., 02.11.2025, ISWC-C 2025, Industry, Doctoral Consortium, Posters and Demos at ISWC 2025: Joint Proceedings of Industry, Doctoral Consortium, Posters and Demos of the 24th International Semantic Web Conference (ISWC-C 2025), ISWC 2025 Companion Volume. Celino, I., Hassanzadeh, O., Bernstein, A., Noy, N., Cheng, G., Wang, S., Ferrada, S., Soulard, T., Kozaki, K., Takeda, H. & Gentile, A. L. (Hrsg.). Aachen: Sun Site Central Europe (RWTH Aachen University), S. 435-440 6 S. D13. (CEUR Workshop Proceedings; Band 4085).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, Linking Meaning: Semantic Technologies Shaping the Future of AI: Cover 74617 Proceedings of the 21st International Conference on Semantic Systems, 3-5 September 2025, Vienna, Austria. Spahiu, B., Vahdati, S., Salatino, A., Pellegrini, T. & Havur, G. (Hrsg.). IOS Press BV, S. 176-193 18 S. (Studies on the Semantic Web; Band 62).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung

Best Practices in AI and Data Science Models Evaluation

Banerjee, D., Taffa, T. A. & Usbeck, R., 2025, INFORMATIK 2025 : The Wide Open - Offenheit von Source bis Science, 16.-19.September 2025 Potsdam. Lucke, U., Stieglitz, S., Uebernickel, F., Lamprecht, A.-L. & Klein, M. (Hrsg.). Bonn: Gesellschaft für Informatik e.V., S. 1211-1219 9 S. (Lecture Notes in Informatics; Band P366).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

DOI

https://doi.org/10.18653/v1/2023.findings-acl.774
Endgültige, publizierte Fassung
https://doi.org/10.48550/arXiv.2305.15108
Andere Versionen