Web-scale extension of RDF knowledge bases from templated websites

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Lorenz Bühmann
  • Ricardo Usbeck
  • Axel Cyrille Ngonga Ngomo
  • Muhammad Saleem
  • Andreas Both
  • Valter Crescenzi
  • Paolo Merialdo
  • Disheng Qiu

Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

Original languageEnglish
Title of host publicationThe SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings
EditorsTania Tudorache, Craig Knoblock, Paul Groth, Carole Goble, Chris Welty, Abraham Bernstein, Peter Mika, Denny Vrandečić, Natasha Noy, Krzysztof Janowicz
Number of pages16
PublisherSpringer Nature Switzerland AG
Publication date2014
Pages66-81
ISBN (print)978-3-319-11963-2
ISBN (electronic)978-3-319-11964-9
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event13th International Semantic Web Conference, ISWC 2014 - Riva del Garda, Italy
Duration: 19.10.201423.10.2014
Conference number: 13
https://search.worldcat.org/de/title/semantic-web-iswc-2014-13th-international-semantic-web-conference-riva-del-garda-italy-october-19-23-2014-proceedings-part-i/oclc/941304230

Bibliographical note

Publisher Copyright:
© Springer International Publishing Switzerland 2014.

Recently viewed

Publications

  1. Stability analysis of a linear model predictive control and its application in a water recovery process
  2. Multilevel bridge governor by using model predictive control in wavelet packets for tracking trajectories
  3. Teachers’ temporary support and worked-out examples as elements of scaffolding in mathematical modeling
  4. Nonlinear PD fault-tolerant control for dynamic positioning of ships with actuator constraints
  5. TARGET SETTING FOR OPERATIONAL PERFORMANCE IMPROVEMENTS - STUDY CASE -
  6. The temporal pattern of creativity and implementation in teams
  7. Governing Objects from a Distance
  8. Noninteracting optimal and adaptive torque control using an online parameter estimation with help of polynomials in EKF for a PMSM
  9. Sensor Fusion for Power Line Sensitive Monitoring and Load State Estimation
  10. Construct- and criterion-related validity of the German Core Self-Evaluations Scale
  11. Participatory energy scenario development as dramatic scripting
  12. Failing and the perception of failure in student-driven transdisciplinary projects
  13. Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset
  14. IWRM through WFD implementation? Drivers for integration in polycentric water governance systems
  15. On the computation of the warping function and the torsional properties of thin-walled crosssections of prismatic beams
  16. Action Errors, Error Management, and Learning in Organizations
  17. Optimal trajectory generation for camless internal combustion engine valve control
  18. Earnings Less Risk-Free Interest Charge (ERIC) and Stock Returns—A Value-Based Management Perspective on ERIC’s Relative and Incremental Information Content
  19. Intraindividual variability in identity centrality
  20. German Utilities and distributed PV
  21. Sustainable Consumption - Mapping the Terrain
  22. A Developmental Trend in the Structure of Time-Estimation Performance
  23. Employing a Novel Metaheuristic Algorithm to Optimize an LSTM Model
  24. Evaluating a Bayesian Student Model of Decimal Misconceptions
  25. Towards a Comprehensive Framework for Environmental Management Accounting
  26. How generative drawing affects the learning process
  27. Development of a Parameterized Model for Additively Manufactured Dies to Control the Strains in Extrudates
  28. Switching cascade controllers combined with a feedforward regulation for an aggregate actuator in automotive applications
  29. Bayesian Parameter Estimation in Green Business Process Management
  30. Logical-Rollenspiele
  31. Life Cycle Assessment of Consumption Patterns – Understanding the links between changing social practices and environmental impacts
  32. On the Difficulty of Forgetting
  33. A Besov space mapping property for the double layer potential on polygons
  34. Optimising business performance with standard software systems
  35. Separable models for interconnected production-inventory systems
  36. Challenges for biodiversity monitoring using citizen science in transitioning social-ecological systems
  37. An automated, modular system for organic waste utilization using Hermetia illucens larvae
  38. Patching Meaningfulness:
  39. Managing Multiple Logics: The Role of Performance Measurement Systems in Social Enterprises
  40. The Creation of the Concept through the Interaction of Philosophy with Science and Art