Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis

Aleksandr Perevalov; Xi Yan; Liubov Kovriguina; Longquan Jiang; Andreas Both; Ricardo Usbeck

Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Standard

Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis. / Perevalov, Aleksandr; Yan, Xi; Kovriguina, Liubov et al.
2022 Language Resources and Evaluation Conference, LREC 2022. ed. / Nicoletta Calzolari; Frederic Bechet; Philippe Blache; Khalid Choukri; Christopher Cieri; Thierry Declerck; Sara Goggi; Hitoshi Isahara; Bente Maegaard; Joseph Mariani; Helene Mazo; Jan Odijk; Stelios Piperidis. European Language Resources Association (ELRA), 2022. p. 2998-3007 (2022 Language Resources and Evaluation Conference, LREC 2022).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Harvard

Perevalov, A, Yan, X, Kovriguina, L, Jiang, L, Both, A & Usbeck, R 2022, Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis. in N Calzolari, F Bechet, P Blache, K Choukri, C Cieri, T Declerck, S Goggi, H Isahara, B Maegaard, J Mariani, H Mazo, J Odijk & S Piperidis (eds), 2022 Language Resources and Evaluation Conference, LREC 2022. 2022 Language Resources and Evaluation Conference, LREC 2022, European Language Resources Association (ELRA), pp. 2998-3007, 13th International Conference on Language Resources and Evaluation Conference - LREC 2022, Marseille, France, 20.06.22. <https://aclanthology.org/2022.lrec-1.321>

APA

Perevalov, A., Yan, X., Kovriguina, L., Jiang, L., Both, A., & Usbeck, R. (2022). Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis. In N. Calzolari, F. Bechet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), 2022 Language Resources and Evaluation Conference, LREC 2022 (pp. 2998-3007). (2022 Language Resources and Evaluation Conference, LREC 2022). European Language Resources Association (ELRA). https://aclanthology.org/2022.lrec-1.321

Vancouver

Perevalov A, Yan X, Kovriguina L, Jiang L, Both A, Usbeck R. Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis. In Calzolari N, Bechet F, Blache P, Choukri K, Cieri C, Declerck T, Goggi S, Isahara H, Maegaard B, Mariani J, Mazo H, Odijk J, Piperidis S, editors, 2022 Language Resources and Evaluation Conference, LREC 2022. European Language Resources Association (ELRA). 2022. p. 2998-3007. (2022 Language Resources and Evaluation Conference, LREC 2022).

Bibtex

@inbook{699c26425fad4ff9a856f7948345e73d,

title = "Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis",

abstract = "Data-driven systems need to be evaluated to establish trust in the scientific approach and its applicability. In particular, this is true for Knowledge Graph (KG) Question Answering (QA), where complex data structures are made accessible via natural-language interfaces. Evaluating the capabilities of these systems has been a driver for the community for more than ten years while establishing different KGQA benchmark datasets. However, comparing different approaches is cumbersome. The lack of existing and curated leaderboards leads to a missing global view over the research field and could inject mistrust into the results. In particular, the latest and most-used datasets in the KGQA community, LC-QuAD and QALD, miss providing central and up-to-date points of trust. In this paper, we survey and analyze a wide range of evaluation results with significant coverage of 100 publications and 98 systems from the last decade. We provide a new central and open leaderboard for any KGQA benchmark dataset as a focal point for the community - https://kgqa.github.io/leaderboard/. Our analysis highlights existing problems during the evaluation of KGQA systems. Thus, we will point to possible improvements for future evaluations.",

keywords = "Evaluation Methodology, Knowledge Graph, Leaderboard, Question Answering, Replication Crisis, Business informatics",

author = "Aleksandr Perevalov and Xi Yan and Liubov Kovriguina and Longquan Jiang and Andreas Both and Ricardo Usbeck",

note = "Publisher Copyright: {\textcopyright} European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.; 13th International Conference on Language Resources and Evaluation Conference - LREC 2022 : Identify, Describe and Share your LRs!, LREC 2022 ; Conference date: 20-06-2022 Through 25-06-2022",

year = "2022",

language = "English",

series = "2022 Language Resources and Evaluation Conference, LREC 2022",

publisher = "European Language Resources Association (ELRA)",

pages = "2998--3007",

editor = "Nicoletta Calzolari and Frederic Bechet and Philippe Blache and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Helene Mazo and Jan Odijk and Stelios Piperidis",

booktitle = "2022 Language Resources and Evaluation Conference, LREC 2022",

address = "Luxembourg",

url = "https://lrec2022.lrec-conf.org/en/",

}

RIS

TY - CHAP

T1 - Knowledge Graph Question Answering Leaderboard

T2 - 13th International Conference on Language Resources and Evaluation Conference - LREC 2022

AU - Perevalov, Aleksandr

AU - Yan, Xi

AU - Kovriguina, Liubov

AU - Jiang, Longquan

AU - Both, Andreas

AU - Usbeck, Ricardo

N1 - Conference code: 13

PY - 2022

Y1 - 2022

N2 - Data-driven systems need to be evaluated to establish trust in the scientific approach and its applicability. In particular, this is true for Knowledge Graph (KG) Question Answering (QA), where complex data structures are made accessible via natural-language interfaces. Evaluating the capabilities of these systems has been a driver for the community for more than ten years while establishing different KGQA benchmark datasets. However, comparing different approaches is cumbersome. The lack of existing and curated leaderboards leads to a missing global view over the research field and could inject mistrust into the results. In particular, the latest and most-used datasets in the KGQA community, LC-QuAD and QALD, miss providing central and up-to-date points of trust. In this paper, we survey and analyze a wide range of evaluation results with significant coverage of 100 publications and 98 systems from the last decade. We provide a new central and open leaderboard for any KGQA benchmark dataset as a focal point for the community - https://kgqa.github.io/leaderboard/. Our analysis highlights existing problems during the evaluation of KGQA systems. Thus, we will point to possible improvements for future evaluations.

AB - Data-driven systems need to be evaluated to establish trust in the scientific approach and its applicability. In particular, this is true for Knowledge Graph (KG) Question Answering (QA), where complex data structures are made accessible via natural-language interfaces. Evaluating the capabilities of these systems has been a driver for the community for more than ten years while establishing different KGQA benchmark datasets. However, comparing different approaches is cumbersome. The lack of existing and curated leaderboards leads to a missing global view over the research field and could inject mistrust into the results. In particular, the latest and most-used datasets in the KGQA community, LC-QuAD and QALD, miss providing central and up-to-date points of trust. In this paper, we survey and analyze a wide range of evaluation results with significant coverage of 100 publications and 98 systems from the last decade. We provide a new central and open leaderboard for any KGQA benchmark dataset as a focal point for the community - https://kgqa.github.io/leaderboard/. Our analysis highlights existing problems during the evaluation of KGQA systems. Thus, we will point to possible improvements for future evaluations.

KW - Evaluation Methodology

KW - Knowledge Graph

KW - Leaderboard

KW - Question Answering

KW - Replication Crisis

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85144360908&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/41eb2e20-4b9b-304e-bf1e-0bfeb3b9aa51/

M3 - Article in conference proceedings

AN - SCOPUS:85144360908

T3 - 2022 Language Resources and Evaluation Conference, LREC 2022

SP - 2998

EP - 3007

BT - 2022 Language Resources and Evaluation Conference, LREC 2022

A2 - Calzolari, Nicoletta

A2 - Bechet, Frederic

A2 - Blache, Philippe

A2 - Choukri, Khalid

A2 - Cieri, Christopher

A2 - Declerck, Thierry

A2 - Goggi, Sara

A2 - Isahara, Hitoshi

A2 - Maegaard, Bente

A2 - Mariani, Joseph

A2 - Mazo, Helene

A2 - Odijk, Jan

A2 - Piperidis, Stelios

PB - European Language Resources Association (ELRA)

Y2 - 20 June 2022 through 25 June 2022

ER -

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (ed.). Springer Science and Business Media Deutschland, p. 95-110 16 p. (Lecture Notes in Computer Science; vol. 15836 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Möller, C. & Usbeck, R., 2025, The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025 Portoroz, Slovenia, June 1–5, 2025 Proceedings, Part I. Curry, E., Acosta, M., Poveda-Villalón, M., van Erp, M., Ojo, A., Hose, K., Shimizu, C. & Lisena, P. (eds.). Cham: Springer Nature Switzerland AG, Vol. 1. p. 460-480 21 p. (Lecture Notes in Computer Science ; vol. 15718).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

ASK-DBLP: Answering Questions over DBLP

Taffa, T., Neises, P., Ollinger, S., Westphal, P., Ackermann, M. R., Banerjee, D. & Usbeck, R., 02.11.2025, ISWC-C 2025, Industry, Doctoral Consortium, Posters and Demos at ISWC 2025: Joint Proceedings of Industry, Doctoral Consortium, Posters and Demos of the 24th International Semantic Web Conference (ISWC-C 2025), ISWC 2025 Companion Volume. Celino, I., Hassanzadeh, O., Bernstein, A., Noy, N., Cheng, G., Wang, S., Ferrada, S., Soulard, T., Kozaki, K., Takeda, H. & Gentile, A. L. (eds.). Aachen: Sun Site Central Europe (RWTH Aachen University), p. 435-440 6 p. D13. (CEUR Workshop Proceedings; vol. 4085).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, Linking Meaning: Semantic Technologies Shaping the Future of AI: Cover 74617 Proceedings of the 21st International Conference on Semantic Systems, 3-5 September 2025, Vienna, Austria. Spahiu, B., Vahdati, S., Salatino, A., Pellegrini, T. & Havur, G. (eds.). IOS Press BV, p. 176-193 18 p. (Studies on the Semantic Web; vol. 62).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research

Best Practices in AI and Data Science Models Evaluation

Banerjee, D., Taffa, T. A. & Usbeck, R., 2025, INFORMATIK 2025 : The Wide Open - Offenheit von Source bis Science, 16.-19.September 2025 Potsdam. Lucke, U., Stieglitz, S., Uebernickel, F., Lamprecht, A.-L. & Klein, M. (eds.). Bonn: Gesellschaft für Informatik, Bonn, p. 1211-1219 9 p. (Lecture Notes in Informatics; vol. P366).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review