Holistic and scalable ranking of RDF data
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Standard
Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017. Hrsg. / Jian-Yun Nie; Zoran Obradovic; Toyotaro Suzumura; Rumi Ghosh; Raghunath Nambiar; Chonggang Wang; Hui Zang; Ricardo Baeza-Yates; Ricardo Baeza-Yates; Xiaohua Hu; Jeremy Kepner; Alfredo Cuzzocrea; Jian Tang; Masashi Toyoda. Institute of Electrical and Electronics Engineers Inc., 2017. S. 746-755 (Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017; Band 2018-January).
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Holistic and scalable ranking of RDF data
AU - Ngomo, Ngonga
AU - Ngomo, Ngonga
AU - Hoffmann, Michael
AU - Usbeck, Ricardo
AU - Jha, Kunal
N1 - Conference code: 5
PY - 2017/7/1
Y1 - 2017/7/1
N2 - The volume and number of data sources published using Semantic Web standards such as RDF grows continuously. The largest of these data sources now contain billions of facts and are updated periodically. A large number of applications driven by such data sources requires the ranking of entities and facts contained in such knowledge graphs. Hence, there is a need for time-efficient approaches that can compute ranks for entities and facts simultaneously. In this paper, we present the first holistic ranking approach for RDF data. Our approach, dubbed HARE, allows the simultaneous computation of ranks for RDF triples, resources, properties and literals. To this end, HARE relies on the representation of RDF graphs as bi-partite graphs. It then employs a time-efficient extension of the random walk paradigm to bi-partite graphs. We show that by virtue of this extension, the worst-case complexity of HARE is O(n5) while that of PageRank is O(n6). In addition, we evaluate the practical efficiency of our approach by comparing it with PageRank on 6 real and 6 synthetic datasets with sizes up to 108 triples. Our results show that HARE is up to 2 orders of magnitude faster than PageRank. We also present a brief evaluation of HARE's ranking accuracy by comparing it with that of PageRank applied directly to RDF graphs. Our evaluation on 19 classes of DBpedia demonstrates that there is no statistical difference between HARE and PageRank. We hence conclude that our approach goes beyond the state of the art by allowing the ranking of all RDF entities and of RDF triples without being worse w.r.t. the ranking quality it achieves on resources. HARE is open-source and is available at http://github.com/dice-group/hare.
AB - The volume and number of data sources published using Semantic Web standards such as RDF grows continuously. The largest of these data sources now contain billions of facts and are updated periodically. A large number of applications driven by such data sources requires the ranking of entities and facts contained in such knowledge graphs. Hence, there is a need for time-efficient approaches that can compute ranks for entities and facts simultaneously. In this paper, we present the first holistic ranking approach for RDF data. Our approach, dubbed HARE, allows the simultaneous computation of ranks for RDF triples, resources, properties and literals. To this end, HARE relies on the representation of RDF graphs as bi-partite graphs. It then employs a time-efficient extension of the random walk paradigm to bi-partite graphs. We show that by virtue of this extension, the worst-case complexity of HARE is O(n5) while that of PageRank is O(n6). In addition, we evaluate the practical efficiency of our approach by comparing it with PageRank on 6 real and 6 synthetic datasets with sizes up to 108 triples. Our results show that HARE is up to 2 orders of magnitude faster than PageRank. We also present a brief evaluation of HARE's ranking accuracy by comparing it with that of PageRank applied directly to RDF graphs. Our evaluation on 19 classes of DBpedia demonstrates that there is no statistical difference between HARE and PageRank. We hence conclude that our approach goes beyond the state of the art by allowing the ranking of all RDF entities and of RDF triples without being worse w.r.t. the ranking quality it achieves on resources. HARE is open-source and is available at http://github.com/dice-group/hare.
KW - Data Volume
KW - PageRank
KW - Ranking
KW - Scalability
KW - Semantic Web
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=85047828136&partnerID=8YFLogxK
U2 - 10.1109/BigData.2017.8257990
DO - 10.1109/BigData.2017.8257990
M3 - Article in conference proceedings
AN - SCOPUS:85047828136
SN - 978-1-5386-2714-3
SN - 978-1-5386-2716-7
T3 - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
SP - 746
EP - 755
BT - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
A2 - Nie, Jian-Yun
A2 - Obradovic, Zoran
A2 - Suzumura, Toyotaro
A2 - Ghosh, Rumi
A2 - Nambiar, Raghunath
A2 - Wang, Chonggang
A2 - Zang, Hui
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Hu, Xiaohua
A2 - Kepner, Jeremy
A2 - Cuzzocrea, Alfredo
A2 - Tang, Jian
A2 - Toyoda, Masashi
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Big Data, Big Data 2017
Y2 - 11 December 2017 through 14 December 2017
ER -