Web-scale extension of RDF knowledge bases from templated websites

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Lorenz Bühmann
  • Ricardo Usbeck
  • Axel Cyrille Ngonga Ngomo
  • Muhammad Saleem
  • Andreas Both
  • Valter Crescenzi
  • Paolo Merialdo
  • Disheng Qiu

Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

Original languageEnglish
Title of host publicationThe SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings
EditorsTania Tudorache, Craig Knoblock, Paul Groth, Carole Goble, Chris Welty, Abraham Bernstein, Peter Mika, Denny Vrandečić, Natasha Noy, Krzysztof Janowicz
Number of pages16
PublisherSpringer Nature Switzerland AG
Publication date2014
Pages66-81
ISBN (print)978-3-319-11963-2
ISBN (electronic)978-3-319-11964-9
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event13th International Semantic Web Conference, ISWC 2014 - Riva del Garda, Italy
Duration: 19.10.201423.10.2014
Conference number: 13
https://search.worldcat.org/de/title/semantic-web-iswc-2014-13th-international-semantic-web-conference-riva-del-garda-italy-october-19-23-2014-proceedings-part-i/oclc/941304230

Bibliographical note

Publisher Copyright:
© Springer International Publishing Switzerland 2014.

Recently viewed

Publications

  1. Clause identification using entropy guided transformation learning
  2. Experimentally established correlation of friction surfacing process temperature and deposit geometry
  3. Intellectual property issues in the use and distribution of remote sensing data
  4. Mathematical Modeling for Robot 3D Laser Scanning in Complete Darkness Environments to Advance Pipeline Inspection
  5. Interpreting Strings, Weaving Threads
  6. Constraints are the solution, not the problem
  7. Robust Flatness Based Control of an Electromagnetic Linear Actuator Using Adaptive PID Controller
  8. Segment Introduction
  9. Investigation and modeling of the material behavior due to evolving dislocation microstructures in fcc and bcc metals
  10. Improving short-term academic performance in the flipped classroom using dynamic geometry software
  11. Homogenization methods for multi-phase elastic composites with non-elliptical reinforcements
  12. From "cracking the orthographic code" to "playing with language"
  13. The signal location task as a method quantifying the distribution of attention
  14. Generating Energy Optimal Powertrain Force Trajectories with Dynamic Constraints
  15. Universal Threshold Calculation for Fingerprinting Decoders using Mixture Models
  16. Analyzing math teacher students' sensitivity for aspects of the complexity of problem oriented mathematics instruction
  17. FaST: A linear time stack trace alignment heuristic for crash report deduplication
  18. Towards a Bayesian Student Model for Detecting Decimal Misconceptions
  19. Real-time RDF extraction from unstructured data streams
  20. What does it mean to be sensitive for the complexity of (problem oriented) teaching?
  21. Combining a PI Controller with an Adaptive Feedforward Control in PMSM
  22. Improving students’ science text comprehension through metacognitive self-regulation when applying learning strategies
  23. “Ideation is Fine, but Execution is Key”
  24. Age effects on controlling tools with sensorimotor transformations
  25. A new way of assessing the interaction of a metallic phase precursor with a modified oxide support substrate as a source of information for predicting metal dispersion
  26. Computing regression statistics from grouped data
  27. Performance analysis for loss systems with many subscribers and concurrent services
  28. Stimulating Computing
  29. Explaining and controlling for the psychometric properties of computer-generated figural matrix items
  30. Scaffolding argumentation in mathematics with CSCL scripts
  31. Foundations and applications of computer based material flow networks for einvironmental management
  32. A localized boundary element method for the floating body problem
  33. Robust feedback linearization control of a throttle plate by using an approximated pd regulator
  34. TARGET SETTING FOR OPERATIONAL PERFORMANCE IMPROVEMENTS - STUDY CASE -
  35. Integration of laser scanning and projection speckle pattern for advanced pipeline monitoring
  36. Partitioned beta diversity patterns of plants across sharp and distinct boundaries of quartz habitat islands
  37. Computer als Medium
  38. OKBQA framework towards an open collaboration for development of natural language question-answering systems over knowledge bases