Web-scale extension of RDF knowledge bases from templated websites

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Web-scale extension of RDF knowledge bases from templated websites. / Bühmann, Lorenz; Usbeck, Ricardo; Ngomo, Axel Cyrille Ngonga et al.
The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. ed. / Tania Tudorache; Craig Knoblock; Paul Groth; Carole Goble; Chris Welty; Abraham Bernstein; Peter Mika; Denny Vrandečić; Natasha Noy; Krzysztof Janowicz. Springer Nature Switzerland AG, 2014. p. 66-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8796).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Bühmann, L, Usbeck, R, Ngomo, ACN, Saleem, M, Both, A, Crescenzi, V, Merialdo, P & Qiu, D 2014, Web-scale extension of RDF knowledge bases from templated websites. in T Tudorache, C Knoblock, P Groth, C Goble, C Welty, A Bernstein, P Mika, D Vrandečić, N Noy & K Janowicz (eds), The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8796, Springer Nature Switzerland AG, pp. 66-81, 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, 19.10.14. https://doi.org/10.1007/978-3-319-11964-9_5

APA

Bühmann, L., Usbeck, R., Ngomo, A. C. N., Saleem, M., Both, A., Crescenzi, V., Merialdo, P., & Qiu, D. (2014). Web-scale extension of RDF knowledge bases from templated websites. In T. Tudorache, C. Knoblock, P. Groth, C. Goble, C. Welty, A. Bernstein, P. Mika, D. Vrandečić, N. Noy, & K. Janowicz (Eds.), The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings (pp. 66-81). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8796). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-319-11964-9_5

Vancouver

Bühmann L, Usbeck R, Ngomo ACN, Saleem M, Both A, Crescenzi V et al. Web-scale extension of RDF knowledge bases from templated websites. In Tudorache T, Knoblock C, Groth P, Goble C, Welty C, Bernstein A, Mika P, Vrandečić D, Noy N, Janowicz K, editors, The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Springer Nature Switzerland AG. 2014. p. 66-81. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-11964-9_5

Bibtex

@inbook{c3a21c2d41ac47eb9fd5c6cc34d1f035,
title = "Web-scale extension of RDF knowledge bases from templated websites",
abstract = "Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.",
keywords = "Informatics, Link Data, Extration Rule, Unstructured data, Xpath Expression, Link Open Data Cloud, Business informatics",
author = "Lorenz B{\"u}hmann and Ricardo Usbeck and Ngomo, {Axel Cyrille Ngonga} and Muhammad Saleem and Andreas Both and Valter Crescenzi and Paolo Merialdo and Disheng Qiu",
note = "Publisher Copyright: {\textcopyright} Springer International Publishing Switzerland 2014.; 13th International Semantic Web Conference, ISWC 2014, ISWC 2014 ; Conference date: 19-10-2014 Through 23-10-2014",
year = "2014",
doi = "10.1007/978-3-319-11964-9_5",
language = "English",
isbn = "978-3-319-11963-2",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Nature Switzerland AG",
pages = "66--81",
editor = "Tania Tudorache and Craig Knoblock and Paul Groth and Carole Goble and Chris Welty and Abraham Bernstein and Peter Mika and Denny Vrande{\v c}i{\'c} and Natasha Noy and Krzysztof Janowicz",
booktitle = "The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings",
address = "Switzerland",
url = "https://search.worldcat.org/de/title/semantic-web-iswc-2014-13th-international-semantic-web-conference-riva-del-garda-italy-october-19-23-2014-proceedings-part-i/oclc/941304230",

}

RIS

TY - CHAP

T1 - Web-scale extension of RDF knowledge bases from templated websites

AU - Bühmann, Lorenz

AU - Usbeck, Ricardo

AU - Ngomo, Axel Cyrille Ngonga

AU - Saleem, Muhammad

AU - Both, Andreas

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Qiu, Disheng

N1 - Conference code: 13

PY - 2014

Y1 - 2014

N2 - Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

AB - Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

KW - Informatics

KW - Link Data

KW - Extration Rule

KW - Unstructured data

KW - Xpath Expression

KW - Link Open Data Cloud

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=84908692879&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/0636091d-3cb4-34d5-81c7-0c47e9ee4ab9/

U2 - 10.1007/978-3-319-11964-9_5

DO - 10.1007/978-3-319-11964-9_5

M3 - Article in conference proceedings

AN - SCOPUS:84908692879

SN - 978-3-319-11963-2

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 66

EP - 81

BT - The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings

A2 - Tudorache, Tania

A2 - Knoblock, Craig

A2 - Groth, Paul

A2 - Goble, Carole

A2 - Welty, Chris

A2 - Bernstein, Abraham

A2 - Mika, Peter

A2 - Vrandečić, Denny

A2 - Noy, Natasha

A2 - Janowicz, Krzysztof

PB - Springer Nature Switzerland AG

T2 - 13th International Semantic Web Conference, ISWC 2014

Y2 - 19 October 2014 through 23 October 2014

ER -

Recently viewed

Publications

  1. Collaborative open science as a way to reproducibility and new insights in primate cognition research
  2. Chapter 9: Particular Remedies for Non-performance: Section 1: Right to Performance
  3. Strengthening the transformative impulse while mainstreaming real-world labs: Lessons learned from three years of BaWü-Labs
  4. Design of an Information-Based Distributed Production Planning System
  5. The Impact of AGVs and Priority Rules in a Real Production Setup – A Simulation Study
  6. An intersection test for the cointegrating rank in dependent panel data
  7. Employing A-B tests for optimizing prices levels in e-commerce applications
  8. Machine Learning and Data Mining for Sports Analytics
  9. Effects of an expressive writing intervention (EWI) with women treated for breast cancer explored with recurrence quantification analysis (RQA) of changes in text structure - a proof-of-concept study
  10. Efficient co-regularised least squares regression
  11. Logistical Potentials of Load Balancing via the Build-up and Reduction of Stock
  12. The model of educational reconstruction: A framework for the design of theory-based content specific interventions
  13. A Column Generation Approach for Bus Driver Rostering Problems
  14. The Lifecycle of "Facts'': A Survey of Social Bias in Knowledge Graphs
  15. Graph-based Approaches for Analyzing Team Interaction on the Example of Soccer
  16. Segment Introduction
  17. A longitudinal multilevel CFA-MTMM model for interchangeable and structurally different methods
  18. A latent state-trait analysis of current achievement motivation across different tasks of cognitive ability
  19. Modellieren in der Sekundarstufe
  20. Comparison of Backpropagation and Kalman Filter-based Training for Neural Networks
  21. Hybrid models for future event prediction
  22. Optimization of a gaseous multitube detector for soft X-ray detection
  23. Integration of laboratory experiments into introductory electrical engineering courses
  24. Convergence of adaptive learning and expectational stability
  25. Diffusion patterns in small vs. large capital markets-the case of value-based management
  26. Using conditional inference trees and random forests to predict the bioaccumulation potential of organic chemicals
  27. Amplifying actions for food system transformation: insights from the Stockholm region
  28. Assembly Modes of General Planar 3-RPR Parallel Mechanisms when Using the Linear Actuators’ Orientations
  29. Gerbil – Benchmarking named entity recognition and linking consistently
  30. Development of Early Spatial Perspective-Taking - Toward a Three-Level Model
  31. Simulation and optimization of material and energy flow systems
  32. Unraveling Privacy Concerns in Complex Data Ecosystems with Architectural Thinking