HySQA: Hybrid Scholarly Question Answering
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Authors
Purpose:
The heterogeneity of scholarly information in knowledge graphs (KGs) and unstructured textual sources poses challenges in building robust Scholarly Question Answering (SQA) systems. Existing datasets and models typically address a narrow spectrum, focusing exclusively on KGs or unstructured sources and limiting evaluation to simple factoid questions. This gap leaves current systems unable to answer complex, hybrid scholarly questions that require integrating evidence from multiple heterogeneous data sources.
Methodology:
We introduce HySQA (Hybrid Scholarly Question Answering), a large-scale benchmarking dataset containing hybrid questions over scholarly KGs and Wikipedia text. HySQA contains complex questions that need to traverse facts across structured and unstructured sources. We also develop a baseline model that adaptively decomposes each question into sub-questions, identifies their answer sources, retrieves relevant information from SKGs and Wikipedia, and generates an answer using a hybrid augmented answer generation framework.
Findings:
The experimental results show that integrating static and adaptive decomposition methods is more effective than static decomposition alone.
Value:
Introducing HySQA provides the community with resources for evaluating the advancements in scholarly QA research.
The heterogeneity of scholarly information in knowledge graphs (KGs) and unstructured textual sources poses challenges in building robust Scholarly Question Answering (SQA) systems. Existing datasets and models typically address a narrow spectrum, focusing exclusively on KGs or unstructured sources and limiting evaluation to simple factoid questions. This gap leaves current systems unable to answer complex, hybrid scholarly questions that require integrating evidence from multiple heterogeneous data sources.
Methodology:
We introduce HySQA (Hybrid Scholarly Question Answering), a large-scale benchmarking dataset containing hybrid questions over scholarly KGs and Wikipedia text. HySQA contains complex questions that need to traverse facts across structured and unstructured sources. We also develop a baseline model that adaptively decomposes each question into sub-questions, identifies their answer sources, retrieves relevant information from SKGs and Wikipedia, and generates an answer using a hybrid augmented answer generation framework.
Findings:
The experimental results show that integrating static and adaptive decomposition methods is more effective than static decomposition alone.
Value:
Introducing HySQA provides the community with resources for evaluating the advancements in scholarly QA research.
| Original language | English | 
|---|---|
| Title of host publication | Linking Meaning: Semantic Technologies Shaping the Future of AI : Proceedings of the 21st International Conference on Semantic Systems, 3-5 September 2025, Vienna, Austria | 
| Editors | Blerina Spahiu, Sahar Vahdati, Angelo Salatino, Tassilo Pellegrini, Giray Havur | 
| Number of pages | 17 | 
| Place of Publication | Amsterdam | 
| Publisher | IOS Press BV | 
| Publication date | 26.08.2025 | 
| Pages | 247-263 | 
| ISBN (electronic) | 978-1-64368-616-5 | 
| DOIs | |
| Publication status | Published - 26.08.2025 | 
| Event | 21st International Conference on Semantic Systems: Linking Meaning: Semantic Technologies Shaping the Future of AI - Wien, Austria Duration: 03.09.2025 → 05.09.2025 Conference number: 21  | 
- Business informatics - Scholarly hybrid questions, Scholarly Question Answering, Hybrid Question Answering, Complex Question Answering
 
