Modern Baselines for SPARQL Semantic Parsing

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

In this work, we focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs (KGs). We assume that gold entity and relations have been provided, and the remaining task is to arrange them in the right order along with SPARQL vocabulary, and input tokens to produce the correct SPARQL query. Pre-trained Language Models (PLMs) have not been explored in depth on this task so far, so we experiment with BART, T5 and PGNs (Pointer Generator Networks) with BERT embeddings, looking for new baselines in the PLM era for this task, on DBpedia and Wikidata KGs. We show that T5 requires special input tokenisation, but produces state of the art performance on LC-QuAD 1.0 and LC-QuAD 2.0 datasets, and outperforms task-specific models from previous works. Moreover, the methods enable semantic parsing for questions where a part of the input needs to be copied to the output query, thus enabling a new paradigm in KG semantic parsing.

Original languageEnglish
Title of host publicationSIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
EditorsEnrique Amigo, Pablo Castells, Julio Gonzalo
Number of pages6
PublisherAssociation for Computing Machinery, Inc
Publication date06.07.2022
Pages2260-2265
ISBN (electronic)978-1-4503-8732-3
DOIs
Publication statusPublished - 06.07.2022
Externally publishedYes
Event45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 2022 - Online + Círculo de Bellas Artes (Circle of Beaux Arts), Madrid, Spain
Duration: 11.07.202215.07.2022
Conference number: 45
https://sigir.org/sigir2022/

Bibliographical note

Publisher Copyright:
© 2022 ACM.

DOI