Holistic and scalable ranking of RDF data

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

The volume and number of data sources published using Semantic Web standards such as RDF grows continuously. The largest of these data sources now contain billions of facts and are updated periodically. A large number of applications driven by such data sources requires the ranking of entities and facts contained in such knowledge graphs. Hence, there is a need for time-efficient approaches that can compute ranks for entities and facts simultaneously. In this paper, we present the first holistic ranking approach for RDF data. Our approach, dubbed HARE, allows the simultaneous computation of ranks for RDF triples, resources, properties and literals. To this end, HARE relies on the representation of RDF graphs as bi-partite graphs. It then employs a time-efficient extension of the random walk paradigm to bi-partite graphs. We show that by virtue of this extension, the worst-case complexity of HARE is O(n5) while that of PageRank is O(n6). In addition, we evaluate the practical efficiency of our approach by comparing it with PageRank on 6 real and 6 synthetic datasets with sizes up to 108 triples. Our results show that HARE is up to 2 orders of magnitude faster than PageRank. We also present a brief evaluation of HARE's ranking accuracy by comparing it with that of PageRank applied directly to RDF graphs. Our evaluation on 19 classes of DBpedia demonstrates that there is no statistical difference between HARE and PageRank. We hence conclude that our approach goes beyond the state of the art by allowing the ranking of all RDF entities and of RDF triples without being worse w.r.t. the ranking quality it achieves on resources. HARE is open-source and is available at http://github.com/dice-group/hare.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
Number of pages10
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date01.07.2017
Pages746-755
ISBN (print)978-1-5386-2714-3, 978-1-5386-2716-7
ISBN (electronic)978-1-5386-2715-0
DOIs
Publication statusPublished - 01.07.2017
Externally publishedYes
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: 11.12.201714.12.2017
Conference number: 5
https://cci.drexel.edu/bigdata/bigdata2017/

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

Recently viewed

Publications

  1. Comparison of different FEM codes approach for extrusion process analysis
  2. Database on Learning for Sustainable Development – analysis of projects
  3. A Wavelet Packet Algorithm for Online Detection of Pantograph Vibrations
  4. Robust decoupling through algebraic output feedback in manipulation systems
  5. Accounting and Modeling as Design Metaphors for CEMIS
  6. Faulty Process Detection Using Machine Learning Techniques
  7. Taking notes as a strategy for solving reality-based tasks in mathematics
  8. Contextual movement models based on normalizing flows
  9. Perception and Inference
  10. A guided simulated annealing search for solving the pick-up and delivery problem with time windows and capacity constraints
  11. The role of learners’ memory in app-based language instruction: the case of Duolingo.
  12. Creating regional (e-)learning networks
  13. Active and semi-supervised data domain description
  14. TextGraphs 2024 Shared Task on Text-Graph Representations for Knowledge Graph Question Answering
  15. Analyzing User Journey Data In Digital Health: Predicting Dropout From A Digital CBT-I Intervention
  16. Recognition and approach responses toward threatening objects
  17. Effectiveness of a guided multicomponent internet and mobile gratitude training program - A pragmatic randomized controlled trial
  18. Formative Perspectives on the Relation Between CSR Communication and CSR Practices
  19. Multi-view discriminative sequential learning
  20. Sensitivity to complexity - an important prerequisite of problem solving mathematics teaching
  21. Web-scale extension of RDF knowledge bases from templated websites
  22. Clause identification using entropy guided transformation learning
  23. Intellectual property issues in the use and distribution of remote sensing data
  24. Mathematical Modeling for Robot 3D Laser Scanning in Complete Darkness Environments to Advance Pipeline Inspection
  25. Constraints are the solution, not the problem
  26. Investigation and modeling of the material behavior due to evolving dislocation microstructures in fcc and bcc metals
  27. A Service-oriented Search framework for full text, geospatial and semantic search
  28. Parameters Estimation of a Lotka-Volterra Model in an Application for Market Graphics Processing Units
  29. Empowering materials processing and performance from data and AI
  30. Changes in the Complexity of Limb Movements during the First Year of Life across Different Tasks
  31. Estimation and interpretation of a Heckman selection model with endogenous covariates
  32. Comparison of Bio-Inspired Algorithms in a Case Study for Optimizing Capacitor Bank Allocation in Electrical Power Distribution