Holistic and scalable ranking of RDF data

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

The volume and number of data sources published using Semantic Web standards such as RDF grows continuously. The largest of these data sources now contain billions of facts and are updated periodically. A large number of applications driven by such data sources requires the ranking of entities and facts contained in such knowledge graphs. Hence, there is a need for time-efficient approaches that can compute ranks for entities and facts simultaneously. In this paper, we present the first holistic ranking approach for RDF data. Our approach, dubbed HARE, allows the simultaneous computation of ranks for RDF triples, resources, properties and literals. To this end, HARE relies on the representation of RDF graphs as bi-partite graphs. It then employs a time-efficient extension of the random walk paradigm to bi-partite graphs. We show that by virtue of this extension, the worst-case complexity of HARE is O(n5) while that of PageRank is O(n6). In addition, we evaluate the practical efficiency of our approach by comparing it with PageRank on 6 real and 6 synthetic datasets with sizes up to 108 triples. Our results show that HARE is up to 2 orders of magnitude faster than PageRank. We also present a brief evaluation of HARE's ranking accuracy by comparing it with that of PageRank applied directly to RDF graphs. Our evaluation on 19 classes of DBpedia demonstrates that there is no statistical difference between HARE and PageRank. We hence conclude that our approach goes beyond the state of the art by allowing the ranking of all RDF entities and of RDF triples without being worse w.r.t. the ranking quality it achieves on resources. HARE is open-source and is available at http://github.com/dice-group/hare.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
Number of pages10
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date01.07.2017
Pages746-755
ISBN (print)978-1-5386-2714-3, 978-1-5386-2716-7
ISBN (electronic)978-1-5386-2715-0
DOIs
Publication statusPublished - 01.07.2017
Externally publishedYes
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: 11.12.201714.12.2017
Conference number: 5
https://cci.drexel.edu/bigdata/bigdata2017/

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

Recently viewed

Publications

  1. Taking notes as a strategy for solving reality-based tasks in mathematics
  2. Lyapunov Convergence Analysis for Asymptotic Tracking Using Forward and Backward Euler Approximation of Discrete Differential Equations
  3. Contextual movement models based on normalizing flows
  4. The role of learners’ memory in app-based language instruction: the case of Duolingo.
  5. Towards a spatial understanding of identity play
  6. Global Finite-Time Stabilization of Planar Linear Systems With Actuator Saturation
  7. A Lean Convolutional Neural Network for Vehicle Classification
  8. A reference architecture for the integration of EMIS and ERP-Systems
  9. Effectiveness of a guided multicomponent internet and mobile gratitude training program - A pragmatic randomized controlled trial
  10. Sensor Fusion for Power Line Sensitive Monitoring and Load State Estimation
  11. Combining a PI Controller with an Adaptive Feedforward Control in PMSM
  12. Overcoming Multi-legacy Application Challenges through Building Dynamic Capabilities for Low-Code Adoption
  13. Outperformed by a Computer? - Comparing Human Decisions to Reinforcement Learning Agents, Assigning Lot Sizes in a Learning Factory
  14. Conceptualizing Role Development in Agile Transformations
  15. Educational reconstruction as model for the theory-based design of student-centered learning environments in electrical engineering courses
  16. Preventive Diagnostics for cardiovascular diseases based on probabilistic methods and description logic
  17. Action rate models for predicting actions in soccer
  18. TRY plant trait database – enhanced coverage and open access
  19. Neural correlates of the enactment effect in the brain
  20. Recontextualizing Anthropomorphic Metaphors in Organization Studies
  21. Assessing Quality of Teaching from Different Perspectives
  22. Trajectory tracking using MPC and a velocity observer for flat actuator systems in automotive applications
  23. Interactions between ecosystem properties and land use clarify spatial strategies to optimize trade-offs between agriculture and species conservation
  24. Global fern and lycophyte richness explained: How regional and local factors shape plot richness
  25. Pluralism and diversity: Trends in the use and application of ordination methods 1990-2007
  26. Failing and the perception of failure in student-driven transdisciplinary projects
  27. Optimization of waste management systems by integrating material fluxes, agents and regulatory mechanisms
  28. ℓp-norm multiple kernel learning