Holistic and scalable ranking of RDF data

Ngonga Ngomo; Ngonga Ngomo; Michael Hoffmann; Ricardo Usbeck; Kunal Jha

doi:10.1109/BigData.2017.8257990

Holistic and scalable ranking of RDF data

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Standard

Holistic and scalable ranking of RDF data. / Ngomo, Ngonga; Ngomo, Ngonga; Hoffmann, Michael et al.
Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017. ed. / Jian-Yun Nie; Zoran Obradovic; Toyotaro Suzumura; Rumi Ghosh; Raghunath Nambiar; Chonggang Wang; Hui Zang; Ricardo Baeza-Yates; Ricardo Baeza-Yates; Xiaohua Hu; Jeremy Kepner; Alfredo Cuzzocrea; Jian Tang; Masashi Toyoda. Institute of Electrical and Electronics Engineers Inc., 2017. p. 746-755 (Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017; Vol. 2018-January).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Harvard

Ngomo, N, Ngomo, N, Hoffmann, M, Usbeck, R & Jha, K 2017, Holistic and scalable ranking of RDF data. in J-Y Nie, Z Obradovic, T Suzumura, R Ghosh, R Nambiar, C Wang, H Zang, R Baeza-Yates, R Baeza-Yates, X Hu, J Kepner, A Cuzzocrea, J Tang & M Toyoda (eds), Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017. Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017, vol. 2018-January, Institute of Electrical and Electronics Engineers Inc., pp. 746-755, 5th IEEE International Conference on Big Data, Big Data 2017, Boston, United States, 11.12.17. https://doi.org/10.1109/BigData.2017.8257990

APA

Ngomo, N., Ngomo, N., Hoffmann, M., Usbeck, R., & Jha, K. (2017). Holistic and scalable ranking of RDF data. In J.-Y. Nie, Z. Obradovic, T. Suzumura, R. Ghosh, R. Nambiar, C. Wang, H. Zang, R. Baeza-Yates, R. Baeza-Yates, X. Hu, J. Kepner, A. Cuzzocrea, J. Tang, & M. Toyoda (Eds.), Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 (pp. 746-755). (Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017; Vol. 2018-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2017.8257990

Vancouver

Ngomo N, Ngomo N, Hoffmann M, Usbeck R, Jha K. Holistic and scalable ranking of RDF data. In Nie JY, Obradovic Z, Suzumura T, Ghosh R, Nambiar R, Wang C, Zang H, Baeza-Yates R, Baeza-Yates R, Hu X, Kepner J, Cuzzocrea A, Tang J, Toyoda M, editors, Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 746-755. (Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017). doi: 10.1109/BigData.2017.8257990

Bibtex

@inbook{a78020f114c8473da3a5f0d30e12dba7,

title = "Holistic and scalable ranking of RDF data",

abstract = "The volume and number of data sources published using Semantic Web standards such as RDF grows continuously. The largest of these data sources now contain billions of facts and are updated periodically. A large number of applications driven by such data sources requires the ranking of entities and facts contained in such knowledge graphs. Hence, there is a need for time-efficient approaches that can compute ranks for entities and facts simultaneously. In this paper, we present the first holistic ranking approach for RDF data. Our approach, dubbed HARE, allows the simultaneous computation of ranks for RDF triples, resources, properties and literals. To this end, HARE relies on the representation of RDF graphs as bi-partite graphs. It then employs a time-efficient extension of the random walk paradigm to bi-partite graphs. We show that by virtue of this extension, the worst-case complexity of HARE is O(n5) while that of PageRank is O(n6). In addition, we evaluate the practical efficiency of our approach by comparing it with PageRank on 6 real and 6 synthetic datasets with sizes up to 108 triples. Our results show that HARE is up to 2 orders of magnitude faster than PageRank. We also present a brief evaluation of HARE's ranking accuracy by comparing it with that of PageRank applied directly to RDF graphs. Our evaluation on 19 classes of DBpedia demonstrates that there is no statistical difference between HARE and PageRank. We hence conclude that our approach goes beyond the state of the art by allowing the ranking of all RDF entities and of RDF triples without being worse w.r.t. the ranking quality it achieves on resources. HARE is open-source and is available at http://github.com/dice-group/hare.",

keywords = "Data Volume, PageRank, Ranking, Scalability, Semantic Web, Informatics, Business informatics",

author = "Ngonga Ngomo and Ngonga Ngomo and Michael Hoffmann and Ricardo Usbeck and Kunal Jha",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 5th IEEE International Conference on Big Data, Big Data 2017, IEEE ; Conference date: 11-12-2017 Through 14-12-2017",

year = "2017",

month = jul,

day = "1",

doi = "10.1109/BigData.2017.8257990",

language = "English",

isbn = "978-1-5386-2714-3",

series = "Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "746--755",

editor = "Jian-Yun Nie and Zoran Obradovic and Toyotaro Suzumura and Rumi Ghosh and Raghunath Nambiar and Chonggang Wang and Hui Zang and Ricardo Baeza-Yates and Ricardo Baeza-Yates and Xiaohua Hu and Jeremy Kepner and Alfredo Cuzzocrea and Jian Tang and Masashi Toyoda",

booktitle = "Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017",

address = "United States",

url = "https://cci.drexel.edu/bigdata/bigdata2017/",

}

RIS

TY - CHAP

T1 - Holistic and scalable ranking of RDF data

AU - Ngomo, Ngonga

AU - Hoffmann, Michael

AU - Usbeck, Ricardo

AU - Jha, Kunal

N1 - Conference code: 5

PY - 2017/7/1

Y1 - 2017/7/1

N2 - The volume and number of data sources published using Semantic Web standards such as RDF grows continuously. The largest of these data sources now contain billions of facts and are updated periodically. A large number of applications driven by such data sources requires the ranking of entities and facts contained in such knowledge graphs. Hence, there is a need for time-efficient approaches that can compute ranks for entities and facts simultaneously. In this paper, we present the first holistic ranking approach for RDF data. Our approach, dubbed HARE, allows the simultaneous computation of ranks for RDF triples, resources, properties and literals. To this end, HARE relies on the representation of RDF graphs as bi-partite graphs. It then employs a time-efficient extension of the random walk paradigm to bi-partite graphs. We show that by virtue of this extension, the worst-case complexity of HARE is O(n5) while that of PageRank is O(n6). In addition, we evaluate the practical efficiency of our approach by comparing it with PageRank on 6 real and 6 synthetic datasets with sizes up to 108 triples. Our results show that HARE is up to 2 orders of magnitude faster than PageRank. We also present a brief evaluation of HARE's ranking accuracy by comparing it with that of PageRank applied directly to RDF graphs. Our evaluation on 19 classes of DBpedia demonstrates that there is no statistical difference between HARE and PageRank. We hence conclude that our approach goes beyond the state of the art by allowing the ranking of all RDF entities and of RDF triples without being worse w.r.t. the ranking quality it achieves on resources. HARE is open-source and is available at http://github.com/dice-group/hare.

AB - The volume and number of data sources published using Semantic Web standards such as RDF grows continuously. The largest of these data sources now contain billions of facts and are updated periodically. A large number of applications driven by such data sources requires the ranking of entities and facts contained in such knowledge graphs. Hence, there is a need for time-efficient approaches that can compute ranks for entities and facts simultaneously. In this paper, we present the first holistic ranking approach for RDF data. Our approach, dubbed HARE, allows the simultaneous computation of ranks for RDF triples, resources, properties and literals. To this end, HARE relies on the representation of RDF graphs as bi-partite graphs. It then employs a time-efficient extension of the random walk paradigm to bi-partite graphs. We show that by virtue of this extension, the worst-case complexity of HARE is O(n5) while that of PageRank is O(n6). In addition, we evaluate the practical efficiency of our approach by comparing it with PageRank on 6 real and 6 synthetic datasets with sizes up to 108 triples. Our results show that HARE is up to 2 orders of magnitude faster than PageRank. We also present a brief evaluation of HARE's ranking accuracy by comparing it with that of PageRank applied directly to RDF graphs. Our evaluation on 19 classes of DBpedia demonstrates that there is no statistical difference between HARE and PageRank. We hence conclude that our approach goes beyond the state of the art by allowing the ranking of all RDF entities and of RDF triples without being worse w.r.t. the ranking quality it achieves on resources. HARE is open-source and is available at http://github.com/dice-group/hare.

KW - Data Volume

KW - PageRank

KW - Ranking

KW - Scalability

KW - Semantic Web

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85047828136&partnerID=8YFLogxK

U2 - 10.1109/BigData.2017.8257990

DO - 10.1109/BigData.2017.8257990

M3 - Article in conference proceedings

AN - SCOPUS:85047828136

SN - 978-1-5386-2714-3

SN - 978-1-5386-2716-7

T3 - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017

SP - 746

EP - 755

BT - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017

A2 - Nie, Jian-Yun

A2 - Obradovic, Zoran

A2 - Suzumura, Toyotaro

A2 - Ghosh, Rumi

A2 - Nambiar, Raghunath

A2 - Wang, Chonggang

A2 - Zang, Hui

A2 - Baeza-Yates, Ricardo

A2 - Hu, Xiaohua

A2 - Kepner, Jeremy

A2 - Cuzzocrea, Alfredo

A2 - Tang, Jian

A2 - Toyoda, Masashi

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 5th IEEE International Conference on Big Data, Big Data 2017

Y2 - 11 December 2017 through 14 December 2017

ER -

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (ed.). Springer Science and Business Media Deutschland, p. 95-110 16 p. (Lecture Notes in Computer Science; vol. 15836 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Möller, C. & Usbeck, R., 2025, The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025 Portoroz, Slovenia, June 1–5, 2025 Proceedings, Part I. Curry, E., Acosta, M., Poveda-Villalón, M., van Erp, M., Ojo, A., Hose, K., Shimizu, C. & Lisena, P. (eds.). Cham: Springer Nature Switzerland AG, Vol. 1. p. 460-480 21 p. (Lecture Notes in Computer Science ; vol. 15718).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

ASK-DBLP: Answering Questions over DBLP

Taffa, T., Neises, P., Ollinger, S., Westphal, P., Ackermann, M. R., Banerjee, D. & Usbeck, R., 02.11.2025, ISWC-C 2025, Industry, Doctoral Consortium, Posters and Demos at ISWC 2025: Joint Proceedings of Industry, Doctoral Consortium, Posters and Demos of the 24th International Semantic Web Conference (ISWC-C 2025), ISWC 2025 Companion Volume. Celino, I., Hassanzadeh, O., Bernstein, A., Noy, N., Cheng, G., Wang, S., Ferrada, S., Soulard, T., Kozaki, K., Takeda, H. & Gentile, A. L. (eds.). Aachen: Sun Site Central Europe (RWTH Aachen University), p. 435-440 6 p. D13. (CEUR Workshop Proceedings; vol. 4085).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, Linking Meaning: Semantic Technologies Shaping the Future of AI: Cover 74617 Proceedings of the 21st International Conference on Semantic Systems, 3-5 September 2025, Vienna, Austria. Spahiu, B., Vahdati, S., Salatino, A., Pellegrini, T. & Havur, G. (eds.). IOS Press BV, p. 176-193 18 p. (Studies on the Semantic Web; vol. 62).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research

Best Practices in AI and Data Science Models Evaluation

Banerjee, D., Taffa, T. A. & Usbeck, R., 2025, INFORMATIK 2025 : The Wide Open - Offenheit von Source bis Science, 16.-19.September 2025 Potsdam. Lucke, U., Stieglitz, S., Uebernickel, F., Lamprecht, A.-L. & Klein, M. (eds.). Bonn: Gesellschaft für Informatik, Bonn, p. 1211-1219 9 p. (Lecture Notes in Informatics; vol. P366).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

DOI

https://doi.org/10.1109/BigData.2017.8257990
Final published version

Holistic and scalable ranking of RDF data

Standard

Harvard

APA

Vancouver

Bibtex

RIS

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

ASK-DBLP: Answering Questions over DBLP

Automating SPARQL Query Translations between DBpedia and Wikidata

Best Practices in AI and Data Science Models Evaluation

DOI

Recently viewed

Projects

Activities

Publications