Holistic and scalable ranking of RDF data

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

The volume and number of data sources published using Semantic Web standards such as RDF grows continuously. The largest of these data sources now contain billions of facts and are updated periodically. A large number of applications driven by such data sources requires the ranking of entities and facts contained in such knowledge graphs. Hence, there is a need for time-efficient approaches that can compute ranks for entities and facts simultaneously. In this paper, we present the first holistic ranking approach for RDF data. Our approach, dubbed HARE, allows the simultaneous computation of ranks for RDF triples, resources, properties and literals. To this end, HARE relies on the representation of RDF graphs as bi-partite graphs. It then employs a time-efficient extension of the random walk paradigm to bi-partite graphs. We show that by virtue of this extension, the worst-case complexity of HARE is O(n5) while that of PageRank is O(n6). In addition, we evaluate the practical efficiency of our approach by comparing it with PageRank on 6 real and 6 synthetic datasets with sizes up to 108 triples. Our results show that HARE is up to 2 orders of magnitude faster than PageRank. We also present a brief evaluation of HARE's ranking accuracy by comparing it with that of PageRank applied directly to RDF graphs. Our evaluation on 19 classes of DBpedia demonstrates that there is no statistical difference between HARE and PageRank. We hence conclude that our approach goes beyond the state of the art by allowing the ranking of all RDF entities and of RDF triples without being worse w.r.t. the ranking quality it achieves on resources. HARE is open-source and is available at http://github.com/dice-group/hare.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
Number of pages10
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date01.07.2017
Pages746-755
ISBN (print)978-1-5386-2714-3, 978-1-5386-2716-7
ISBN (electronic)978-1-5386-2715-0
DOIs
Publication statusPublished - 01.07.2017
Externally publishedYes
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: 11.12.201714.12.2017
Conference number: 5
https://cci.drexel.edu/bigdata/bigdata2017/

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

Recently viewed

Publications

  1. Database on Learning for Sustainable Development – analysis of projects
  2. Taking notes as a strategy for solving reality-based tasks in mathematics
  3. The role of learners’ memory in app-based language instruction: the case of Duolingo.
  4. Creating regional (e-)learning networks
  5. Towards a spatial understanding of identity play
  6. A reference architecture for the integration of EMIS and ERP-Systems
  7. Effectiveness of a guided multicomponent internet and mobile gratitude training program - A pragmatic randomized controlled trial
  8. Multi-view discriminative sequential learning
  9. Supporting the Development and Implementation of a Digitalization Strategy in SMEs through a Lightweight Architecture-based Method
  10. Mathematical Modeling for Robot 3D Laser Scanning in Complete Darkness Environments to Advance Pipeline Inspection
  11. Constraints are the solution, not the problem
  12. Robust Flatness Based Control of an Electromagnetic Linear Actuator Using Adaptive PID Controller
  13. Investigation and modeling of the material behavior due to evolving dislocation microstructures in fcc and bcc metals
  14. Understanding storytelling in the context of information systems
  15. The signal location task as a method quantifying the distribution of attention
  16. Analyzing math teacher students' sensitivity for aspects of the complexity of problem oriented mathematics instruction
  17. FaST: A linear time stack trace alignment heuristic for crash report deduplication
  18. What does it mean to be sensitive for the complexity of (problem oriented) teaching?
  19. Improving students’ science text comprehension through metacognitive self-regulation when applying learning strategies
  20. “Ideation is Fine, but Execution is Key”
  21. Age effects on controlling tools with sensorimotor transformations
  22. Assessing Effects Through Semi-Field and Field Toxicity Testing
  23. A new way of assessing the interaction of a metallic phase precursor with a modified oxide support substrate as a source of information for predicting metal dispersion
  24. Computing regression statistics from grouped data
  25. An analytical approach to evaluating bivariate functions of fuzzy numbers with one local extremum
  26. Performance analysis for loss systems with many subscribers and concurrent services
  27. Explaining and controlling for the psychometric properties of computer-generated figural matrix items
  28. Scaffolding argumentation in mathematics with CSCL scripts
  29. A localized boundary element method for the floating body problem
  30. Robust feedback linearization control of a throttle plate by using an approximated pd regulator
  31. On the Decoupling and Output Functional Controllability of Robotic Manipulation
  32. TARGET SETTING FOR OPERATIONAL PERFORMANCE IMPROVEMENTS - STUDY CASE -
  33. Integration of laser scanning and projection speckle pattern for advanced pipeline monitoring
  34. Partitioned beta diversity patterns of plants across sharp and distinct boundaries of quartz habitat islands
  35. Computer als Medium