Modern Baselines for SPARQL Semantic Parsing

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

In this work, we focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs (KGs). We assume that gold entity and relations have been provided, and the remaining task is to arrange them in the right order along with SPARQL vocabulary, and input tokens to produce the correct SPARQL query. Pre-trained Language Models (PLMs) have not been explored in depth on this task so far, so we experiment with BART, T5 and PGNs (Pointer Generator Networks) with BERT embeddings, looking for new baselines in the PLM era for this task, on DBpedia and Wikidata KGs. We show that T5 requires special input tokenisation, but produces state of the art performance on LC-QuAD 1.0 and LC-QuAD 2.0 datasets, and outperforms task-specific models from previous works. Moreover, the methods enable semantic parsing for questions where a part of the input needs to be copied to the output query, thus enabling a new paradigm in KG semantic parsing.

OriginalspracheEnglisch
TitelSIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
HerausgeberEnrique Amigo, Pablo Castells, Julio Gonzalo
Anzahl der Seiten6
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum06.07.2022
Seiten2260-2265
ISBN (elektronisch)978-1-4503-8732-3
DOIs
PublikationsstatusErschienen - 06.07.2022
Extern publiziertJa
Veranstaltung45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 2022 - Online + Círculo de Bellas Artes (Circle of Beaux Arts), Madrid, Spanien
Dauer: 11.07.202215.07.2022
Konferenznummer: 45
https://sigir.org/sigir2022/

Bibliographische Notiz

Funding Information:
This research was partially funded by the German Federal Ministry of Education and Research (BMBF) as part of the INSTANT project, ID 02L18A111.

Publisher Copyright:
© 2022 ACM.

DOI