Web-scale extension of RDF knowledge bases from templated websites

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Lorenz Bühmann
  • Ricardo Usbeck
  • Axel Cyrille Ngonga Ngomo
  • Muhammad Saleem
  • Andreas Both
  • Valter Crescenzi
  • Paolo Merialdo
  • Disheng Qiu

Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

Original languageEnglish
Title of host publicationThe SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings
EditorsTania Tudorache, Craig Knoblock, Paul Groth, Carole Goble, Chris Welty, Abraham Bernstein, Peter Mika, Denny Vrandečić, Natasha Noy, Krzysztof Janowicz
Number of pages16
PublisherSpringer Nature Switzerland AG
Publication date2014
Pages66-81
ISBN (print)978-3-319-11963-2
ISBN (electronic)978-3-319-11964-9
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event13th International Semantic Web Conference, ISWC 2014 - Riva del Garda, Italy
Duration: 19.10.201423.10.2014
Conference number: 13
https://search.worldcat.org/de/title/semantic-web-iswc-2014-13th-international-semantic-web-conference-riva-del-garda-italy-october-19-23-2014-proceedings-part-i/oclc/941304230

Bibliographical note

Publisher Copyright:
© Springer International Publishing Switzerland 2014.

Recently viewed

Publications

  1. Interpreting Strings, Weaving Threads
  2. Constraints are the solution, not the problem
  3. An extended analytical approach to evaluating monotonic functions of fuzzy numbers
  4. Advantages and disadvantages of different text coding procedures for research and practice in a school context
  5. Parameters Estimation of a Lotka-Volterra Model in an Application for Market Graphics Processing Units
  6. Robust Flatness Based Control of an Electromagnetic Linear Actuator Using Adaptive PID Controller
  7. Segment Introduction
  8. Empowering materials processing and performance from data and AI
  9. Changes in the Complexity of Limb Movements during the First Year of Life across Different Tasks
  10. Comparison of Bio-Inspired Algorithms in a Case Study for Optimizing Capacitor Bank Allocation in Electrical Power Distribution
  11. Changing the Administration from within:
  12. Estimation and interpretation of a Heckman selection model with endogenous covariates
  13. Mining positional data streams
  14. From "cracking the orthographic code" to "playing with language"
  15. Who can receive the pass? A computational model for quantifying availability in soccer
  16. An analytical approach to evaluating nonmonotonic functions of fuzzy numbers
  17. Enhancing implicit change detection through action
  18. Development of a scoring parameter to characterize data quality of centroids in high-resolution mass spectra
  19. FaST: A linear time stack trace alignment heuristic for crash report deduplication
  20. Understanding the properties of isospectral points and pairs in graphs
  21. Analyzing math teacher students' sensitivity for aspects of the complexity of problem oriented mathematics instruction
  22. Trait correlation network analysis identifies biomass allocation traits and stem specific length as hub traits in herbaceous perennial plants
  23. The signal location task as a method quantifying the distribution of attention
  24. Applications of the Simultaneous Modular Approach in the Field of Material Flow Analysis
  25. Generating Energy Optimal Powertrain Force Trajectories with Dynamic Constraints
  26. Universal Threshold Calculation for Fingerprinting Decoders using Mixture Models
  27. Understanding reading as a form of language-use
  28. Towards a Bayesian Student Model for Detecting Decimal Misconceptions
  29. A statistical study of the spatial evolution of shock acceleration efficiency for 5 MeV protons and subsequent particle propagation
  30. What does it mean to be sensitive for the complexity of (problem oriented) teaching?
  31. “Ideation is Fine, but Execution is Key”
  32. Performance analysis for loss systems with many subscribers and concurrent services
  33. Simulating X-ray beam energy and detector signal processing of an industrial CT using implicit neural representations
  34. A new way of assessing the interaction of a metallic phase precursor with a modified oxide support substrate as a source of information for predicting metal dispersion
  35. Stimulating Computing
  36. Improving students’ science text comprehension through metacognitive self-regulation when applying learning strategies
  37. Identification of conductive fiber parameters with transcutaneous electrical nerve stimulation signal using RLS algorithm