Web-scale extension of RDF knowledge bases from templated websites

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Web-scale extension of RDF knowledge bases from templated websites. / Bühmann, Lorenz; Usbeck, Ricardo; Ngomo, Axel Cyrille Ngonga et al.

The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. ed. / Tania Tudorache; Craig Knoblock; Paul Groth; Carole Goble; Chris Welty; Abraham Bernstein; Peter Mika; Denny Vrandečić; Natasha Noy; Krzysztof Janowicz. Springer Nature Switzerland AG, 2014. p. 66-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8796).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Bühmann, L, Usbeck, R, Ngomo, ACN, Saleem, M, Both, A, Crescenzi, V, Merialdo, P & Qiu, D 2014, Web-scale extension of RDF knowledge bases from templated websites. in T Tudorache, C Knoblock, P Groth, C Goble, C Welty, A Bernstein, P Mika, D Vrandečić, N Noy & K Janowicz (eds), The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8796, Springer Nature Switzerland AG, pp. 66-81, 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, 19.10.14. https://doi.org/10.1007/978-3-319-11964-9_5

APA

Bühmann, L., Usbeck, R., Ngomo, A. C. N., Saleem, M., Both, A., Crescenzi, V., Merialdo, P., & Qiu, D. (2014). Web-scale extension of RDF knowledge bases from templated websites. In T. Tudorache, C. Knoblock, P. Groth, C. Goble, C. Welty, A. Bernstein, P. Mika, D. Vrandečić, N. Noy, & K. Janowicz (Eds.), The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings (pp. 66-81). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8796). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-319-11964-9_5

Vancouver

Bühmann L, Usbeck R, Ngomo ACN, Saleem M, Both A, Crescenzi V et al. Web-scale extension of RDF knowledge bases from templated websites. In Tudorache T, Knoblock C, Groth P, Goble C, Welty C, Bernstein A, Mika P, Vrandečić D, Noy N, Janowicz K, editors, The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Springer Nature Switzerland AG. 2014. p. 66-81. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-11964-9_5

Bibtex

@inbook{c3a21c2d41ac47eb9fd5c6cc34d1f035,
title = "Web-scale extension of RDF knowledge bases from templated websites",
abstract = "Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.",
keywords = "Informatics, Link Data, Extration Rule, Unstructured data, Xpath Expression, Link Open Data Cloud, Business informatics",
author = "Lorenz B{\"u}hmann and Ricardo Usbeck and Ngomo, {Axel Cyrille Ngonga} and Muhammad Saleem and Andreas Both and Valter Crescenzi and Paolo Merialdo and Disheng Qiu",
note = "Publisher Copyright: {\textcopyright} Springer International Publishing Switzerland 2014.; 13th International Semantic Web Conference, ISWC 2014, ISWC 2014 ; Conference date: 19-10-2014 Through 23-10-2014",
year = "2014",
doi = "10.1007/978-3-319-11964-9_5",
language = "English",
isbn = "978-3-319-11963-2",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Nature Switzerland AG",
pages = "66--81",
editor = "Tania Tudorache and Craig Knoblock and Paul Groth and Carole Goble and Chris Welty and Abraham Bernstein and Peter Mika and Denny Vrande{\v c}i{\'c} and Natasha Noy and Krzysztof Janowicz",
booktitle = "The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings",
address = "Switzerland",
url = "https://search.worldcat.org/de/title/semantic-web-iswc-2014-13th-international-semantic-web-conference-riva-del-garda-italy-october-19-23-2014-proceedings-part-i/oclc/941304230",

}

RIS

TY - CHAP

T1 - Web-scale extension of RDF knowledge bases from templated websites

AU - Bühmann, Lorenz

AU - Usbeck, Ricardo

AU - Ngomo, Axel Cyrille Ngonga

AU - Saleem, Muhammad

AU - Both, Andreas

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Qiu, Disheng

N1 - Conference code: 13

PY - 2014

Y1 - 2014

N2 - Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

AB - Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

KW - Informatics

KW - Link Data

KW - Extration Rule

KW - Unstructured data

KW - Xpath Expression

KW - Link Open Data Cloud

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=84908692879&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/0636091d-3cb4-34d5-81c7-0c47e9ee4ab9/

U2 - 10.1007/978-3-319-11964-9_5

DO - 10.1007/978-3-319-11964-9_5

M3 - Article in conference proceedings

AN - SCOPUS:84908692879

SN - 978-3-319-11963-2

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 66

EP - 81

BT - The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings

A2 - Tudorache, Tania

A2 - Knoblock, Craig

A2 - Groth, Paul

A2 - Goble, Carole

A2 - Welty, Chris

A2 - Bernstein, Abraham

A2 - Mika, Peter

A2 - Vrandečić, Denny

A2 - Noy, Natasha

A2 - Janowicz, Krzysztof

PB - Springer Nature Switzerland AG

T2 - 13th International Semantic Web Conference, ISWC 2014

Y2 - 19 October 2014 through 23 October 2014

ER -