Web-scale extension of RDF knowledge bases from templated websites

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Web-scale extension of RDF knowledge bases from templated websites. / Bühmann, Lorenz; Usbeck, Ricardo; Ngomo, Axel Cyrille Ngonga et al.
The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. ed. / Tania Tudorache; Craig Knoblock; Paul Groth; Carole Goble; Chris Welty; Abraham Bernstein; Peter Mika; Denny Vrandečić; Natasha Noy; Krzysztof Janowicz. Springer Nature Switzerland AG, 2014. p. 66-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8796).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Bühmann, L, Usbeck, R, Ngomo, ACN, Saleem, M, Both, A, Crescenzi, V, Merialdo, P & Qiu, D 2014, Web-scale extension of RDF knowledge bases from templated websites. in T Tudorache, C Knoblock, P Groth, C Goble, C Welty, A Bernstein, P Mika, D Vrandečić, N Noy & K Janowicz (eds), The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8796, Springer Nature Switzerland AG, pp. 66-81, 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, 19.10.14. https://doi.org/10.1007/978-3-319-11964-9_5

APA

Bühmann, L., Usbeck, R., Ngomo, A. C. N., Saleem, M., Both, A., Crescenzi, V., Merialdo, P., & Qiu, D. (2014). Web-scale extension of RDF knowledge bases from templated websites. In T. Tudorache, C. Knoblock, P. Groth, C. Goble, C. Welty, A. Bernstein, P. Mika, D. Vrandečić, N. Noy, & K. Janowicz (Eds.), The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings (pp. 66-81). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8796). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-319-11964-9_5

Vancouver

Bühmann L, Usbeck R, Ngomo ACN, Saleem M, Both A, Crescenzi V et al. Web-scale extension of RDF knowledge bases from templated websites. In Tudorache T, Knoblock C, Groth P, Goble C, Welty C, Bernstein A, Mika P, Vrandečić D, Noy N, Janowicz K, editors, The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Springer Nature Switzerland AG. 2014. p. 66-81. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-11964-9_5

Bibtex

@inbook{c3a21c2d41ac47eb9fd5c6cc34d1f035,
title = "Web-scale extension of RDF knowledge bases from templated websites",
abstract = "Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.",
keywords = "Informatics, Link Data, Extration Rule, Unstructured data, Xpath Expression, Link Open Data Cloud, Business informatics",
author = "Lorenz B{\"u}hmann and Ricardo Usbeck and Ngomo, {Axel Cyrille Ngonga} and Muhammad Saleem and Andreas Both and Valter Crescenzi and Paolo Merialdo and Disheng Qiu",
note = "Publisher Copyright: {\textcopyright} Springer International Publishing Switzerland 2014.; 13th International Semantic Web Conference, ISWC 2014, ISWC 2014 ; Conference date: 19-10-2014 Through 23-10-2014",
year = "2014",
doi = "10.1007/978-3-319-11964-9_5",
language = "English",
isbn = "978-3-319-11963-2",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Nature Switzerland AG",
pages = "66--81",
editor = "Tania Tudorache and Craig Knoblock and Paul Groth and Carole Goble and Chris Welty and Abraham Bernstein and Peter Mika and Denny Vrande{\v c}i{\'c} and Natasha Noy and Krzysztof Janowicz",
booktitle = "The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings",
address = "Switzerland",
url = "https://search.worldcat.org/de/title/semantic-web-iswc-2014-13th-international-semantic-web-conference-riva-del-garda-italy-october-19-23-2014-proceedings-part-i/oclc/941304230",

}

RIS

TY - CHAP

T1 - Web-scale extension of RDF knowledge bases from templated websites

AU - Bühmann, Lorenz

AU - Usbeck, Ricardo

AU - Ngomo, Axel Cyrille Ngonga

AU - Saleem, Muhammad

AU - Both, Andreas

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Qiu, Disheng

N1 - Conference code: 13

PY - 2014

Y1 - 2014

N2 - Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

AB - Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

KW - Informatics

KW - Link Data

KW - Extration Rule

KW - Unstructured data

KW - Xpath Expression

KW - Link Open Data Cloud

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=84908692879&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/0636091d-3cb4-34d5-81c7-0c47e9ee4ab9/

U2 - 10.1007/978-3-319-11964-9_5

DO - 10.1007/978-3-319-11964-9_5

M3 - Article in conference proceedings

AN - SCOPUS:84908692879

SN - 978-3-319-11963-2

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 66

EP - 81

BT - The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings

A2 - Tudorache, Tania

A2 - Knoblock, Craig

A2 - Groth, Paul

A2 - Goble, Carole

A2 - Welty, Chris

A2 - Bernstein, Abraham

A2 - Mika, Peter

A2 - Vrandečić, Denny

A2 - Noy, Natasha

A2 - Janowicz, Krzysztof

PB - Springer Nature Switzerland AG

T2 - 13th International Semantic Web Conference, ISWC 2014

Y2 - 19 October 2014 through 23 October 2014

ER -

Recently viewed

Publications

  1. Introducing split orders and optimizing operational policies in robotic mobile fulfillment systems
  2. Comparison of Bio-Inspired Algorithms in a Case Study for Optimizing Capacitor Bank Allocation in Electrical Power Distribution
  3. Managing complexity in automative production
  4. Designing and evaluating blended learning bridging courses in mathematics
  5. What Makes for a Good Theory? How to Evaluate a Theory Using the Strength Model of Self-Control as an Example
  6. Do connectives improve the level of understandability in mathematical reality-based tasks?
  7. Executive function and Language Learning
  8. An error management perspective on audit quality
  9. TARGET SETTING FOR OPERATIONAL PERFORMANCE IMPROVEMENTS - STUDY CASE -
  10. Measuring cognitive load with subjective rating scales during problem solving
  11. The temporal pattern of creativity and implementation in teams
  12. Conceptions of problem solving mathematics teaching
  13. A reference architecture for the integration of EMIS and ERP-Systems
  14. The erosion of relational values resulting from landscape simplification
  15. Parametric finite element model and mechanical characterisation of electrospun materials for biomedical applications
  16. What´s in a net? or: The end of the average
  17. Governing Objects from a Distance
  18. Obstacle Coordinates Transformation from TVS Body-Frame to AGV Navigation-Frame
  19. Noninteracting optimal and adaptive torque control using an online parameter estimation with help of polynomials in EKF for a PMSM
  20. Convolutional Neural Networks
  21. Development of a scoring parameter to characterize data quality of centroids in high-resolution mass spectra
  22. Development of a Parameterized Model for Additively Manufactured Dies to Control the Strains in Extrudates
  23. Comparison of different FEM codes approach for extrusion process analysis
  24. A Class of Simple Stochastic Online Bin Packing Algorithms
  25. Sampling and processing of climate change information and disinformation across three diverse countries
  26. Measuring the semantic priming effect across many languages
  27. Between institutional scaling and artistic probing. How traditional performing arts organizations navigate digital transformation