Standard
Web-scale extension of RDF knowledge bases from templated websites. / Bühmann, Lorenz
; Usbeck, Ricardo; Ngomo, Axel Cyrille Ngonga et al.
The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Hrsg. / Tania Tudorache; Craig Knoblock; Paul Groth; Carole Goble; Chris Welty; Abraham Bernstein; Peter Mika; Denny Vrandečić; Natasha Noy; Krzysztof Janowicz. Springer Nature Switzerland AG, 2014. S. 66-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 8796).
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Harvard
Bühmann, L
, Usbeck, R, Ngomo, ACN, Saleem, M, Both, A, Crescenzi, V, Merialdo, P & Qiu, D 2014,
Web-scale extension of RDF knowledge bases from templated websites. in T Tudorache, C Knoblock, P Groth, C Goble, C Welty, A Bernstein, P Mika, D Vrandečić, N Noy & K Janowicz (Hrsg.),
The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 8796, Springer Nature Switzerland AG, S. 66-81, 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italien,
19.10.14.
https://doi.org/10.1007/978-3-319-11964-9_5
APA
Bühmann, L.
, Usbeck, R., Ngomo, A. C. N., Saleem, M., Both, A., Crescenzi, V., Merialdo, P., & Qiu, D. (2014).
Web-scale extension of RDF knowledge bases from templated websites. In T. Tudorache, C. Knoblock, P. Groth, C. Goble, C. Welty, A. Bernstein, P. Mika, D. Vrandečić, N. Noy, & K. Janowicz (Hrsg.),
The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings (S. 66-81). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 8796). Springer Nature Switzerland AG.
https://doi.org/10.1007/978-3-319-11964-9_5
Vancouver
Bühmann L
, Usbeck R, Ngomo ACN, Saleem M, Both A, Crescenzi V et al.
Web-scale extension of RDF knowledge bases from templated websites. in Tudorache T, Knoblock C, Groth P, Goble C, Welty C, Bernstein A, Mika P, Vrandečić D, Noy N, Janowicz K, Hrsg., The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Springer Nature Switzerland AG. 2014. S. 66-81. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-11964-9_5
Bibtex
@inbook{c3a21c2d41ac47eb9fd5c6cc34d1f035,
title = "Web-scale extension of RDF knowledge bases from templated websites",
abstract = "Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.",
keywords = "Informatics, Link Data, Extration Rule, Unstructured data, Xpath Expression, Link Open Data Cloud, Business informatics",
author = "Lorenz B{\"u}hmann and Ricardo Usbeck and Ngomo, {Axel Cyrille Ngonga} and Muhammad Saleem and Andreas Both and Valter Crescenzi and Paolo Merialdo and Disheng Qiu",
note = "Publisher Copyright: {\textcopyright} Springer International Publishing Switzerland 2014.; 13th International Semantic Web Conference, ISWC 2014, ISWC 2014 ; Conference date: 19-10-2014 Through 23-10-2014",
year = "2014",
doi = "10.1007/978-3-319-11964-9_5",
language = "English",
isbn = "978-3-319-11963-2",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Nature Switzerland AG",
pages = "66--81",
editor = "Tania Tudorache and Craig Knoblock and Paul Groth and Carole Goble and Chris Welty and Abraham Bernstein and Peter Mika and Denny Vrande{\v c}i{\'c} and Natasha Noy and Krzysztof Janowicz",
booktitle = "The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings",
address = "Switzerland",
url = "https://search.worldcat.org/de/title/semantic-web-iswc-2014-13th-international-semantic-web-conference-riva-del-garda-italy-october-19-23-2014-proceedings-part-i/oclc/941304230",
}
RIS
TY - CHAP
T1 - Web-scale extension of RDF knowledge bases from templated websites
AU - Bühmann, Lorenz
AU - Usbeck, Ricardo
AU - Ngomo, Axel Cyrille Ngonga
AU - Saleem, Muhammad
AU - Both, Andreas
AU - Crescenzi, Valter
AU - Merialdo, Paolo
AU - Qiu, Disheng
N1 - Conference code: 13
PY - 2014
Y1 - 2014
N2 - Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.
AB - Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.
KW - Informatics
KW - Link Data
KW - Extration Rule
KW - Unstructured data
KW - Xpath Expression
KW - Link Open Data Cloud
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=84908692879&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/0636091d-3cb4-34d5-81c7-0c47e9ee4ab9/
U2 - 10.1007/978-3-319-11964-9_5
DO - 10.1007/978-3-319-11964-9_5
M3 - Article in conference proceedings
AN - SCOPUS:84908692879
SN - 978-3-319-11963-2
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 66
EP - 81
BT - The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings
A2 - Tudorache, Tania
A2 - Knoblock, Craig
A2 - Groth, Paul
A2 - Goble, Carole
A2 - Welty, Chris
A2 - Bernstein, Abraham
A2 - Mika, Peter
A2 - Vrandečić, Denny
A2 - Noy, Natasha
A2 - Janowicz, Krzysztof
PB - Springer Nature Switzerland AG
T2 - 13th International Semantic Web Conference, ISWC 2014
Y2 - 19 October 2014 through 23 October 2014
ER -