Web-scale extension of RDF knowledge bases from templated websites

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Web-scale extension of RDF knowledge bases from templated websites. / Bühmann, Lorenz; Usbeck, Ricardo; Ngomo, Axel Cyrille Ngonga et al.
The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. ed. / Tania Tudorache; Craig Knoblock; Paul Groth; Carole Goble; Chris Welty; Abraham Bernstein; Peter Mika; Denny Vrandečić; Natasha Noy; Krzysztof Janowicz. Springer Nature Switzerland AG, 2014. p. 66-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8796).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Bühmann, L, Usbeck, R, Ngomo, ACN, Saleem, M, Both, A, Crescenzi, V, Merialdo, P & Qiu, D 2014, Web-scale extension of RDF knowledge bases from templated websites. in T Tudorache, C Knoblock, P Groth, C Goble, C Welty, A Bernstein, P Mika, D Vrandečić, N Noy & K Janowicz (eds), The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8796, Springer Nature Switzerland AG, pp. 66-81, 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, 19.10.14. https://doi.org/10.1007/978-3-319-11964-9_5

APA

Bühmann, L., Usbeck, R., Ngomo, A. C. N., Saleem, M., Both, A., Crescenzi, V., Merialdo, P., & Qiu, D. (2014). Web-scale extension of RDF knowledge bases from templated websites. In T. Tudorache, C. Knoblock, P. Groth, C. Goble, C. Welty, A. Bernstein, P. Mika, D. Vrandečić, N. Noy, & K. Janowicz (Eds.), The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings (pp. 66-81). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8796). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-319-11964-9_5

Vancouver

Bühmann L, Usbeck R, Ngomo ACN, Saleem M, Both A, Crescenzi V et al. Web-scale extension of RDF knowledge bases from templated websites. In Tudorache T, Knoblock C, Groth P, Goble C, Welty C, Bernstein A, Mika P, Vrandečić D, Noy N, Janowicz K, editors, The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings. Springer Nature Switzerland AG. 2014. p. 66-81. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-11964-9_5

Bibtex

@inbook{c3a21c2d41ac47eb9fd5c6cc34d1f035,
title = "Web-scale extension of RDF knowledge bases from templated websites",
abstract = "Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.",
keywords = "Informatics, Link Data, Extration Rule, Unstructured data, Xpath Expression, Link Open Data Cloud, Business informatics",
author = "Lorenz B{\"u}hmann and Ricardo Usbeck and Ngomo, {Axel Cyrille Ngonga} and Muhammad Saleem and Andreas Both and Valter Crescenzi and Paolo Merialdo and Disheng Qiu",
note = "Publisher Copyright: {\textcopyright} Springer International Publishing Switzerland 2014.; 13th International Semantic Web Conference, ISWC 2014, ISWC 2014 ; Conference date: 19-10-2014 Through 23-10-2014",
year = "2014",
doi = "10.1007/978-3-319-11964-9_5",
language = "English",
isbn = "978-3-319-11963-2",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Nature Switzerland AG",
pages = "66--81",
editor = "Tania Tudorache and Craig Knoblock and Paul Groth and Carole Goble and Chris Welty and Abraham Bernstein and Peter Mika and Denny Vrande{\v c}i{\'c} and Natasha Noy and Krzysztof Janowicz",
booktitle = "The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings",
address = "Switzerland",
url = "https://search.worldcat.org/de/title/semantic-web-iswc-2014-13th-international-semantic-web-conference-riva-del-garda-italy-october-19-23-2014-proceedings-part-i/oclc/941304230",

}

RIS

TY - CHAP

T1 - Web-scale extension of RDF knowledge bases from templated websites

AU - Bühmann, Lorenz

AU - Usbeck, Ricardo

AU - Ngomo, Axel Cyrille Ngonga

AU - Saleem, Muhammad

AU - Both, Andreas

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Qiu, Disheng

N1 - Conference code: 13

PY - 2014

Y1 - 2014

N2 - Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

AB - Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data.While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.

KW - Informatics

KW - Link Data

KW - Extration Rule

KW - Unstructured data

KW - Xpath Expression

KW - Link Open Data Cloud

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=84908692879&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/0636091d-3cb4-34d5-81c7-0c47e9ee4ab9/

U2 - 10.1007/978-3-319-11964-9_5

DO - 10.1007/978-3-319-11964-9_5

M3 - Article in conference proceedings

AN - SCOPUS:84908692879

SN - 978-3-319-11963-2

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 66

EP - 81

BT - The SemanticWeb - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings

A2 - Tudorache, Tania

A2 - Knoblock, Craig

A2 - Groth, Paul

A2 - Goble, Carole

A2 - Welty, Chris

A2 - Bernstein, Abraham

A2 - Mika, Peter

A2 - Vrandečić, Denny

A2 - Noy, Natasha

A2 - Janowicz, Krzysztof

PB - Springer Nature Switzerland AG

T2 - 13th International Semantic Web Conference, ISWC 2014

Y2 - 19 October 2014 through 23 October 2014

ER -

Recently viewed

Activities

  1. Coauthoring an interorganizational collaboration: Exploring multi-voicedness and introducing spatiotemporal orientations
  2. Artificial Intelligence and Intellectual Property
  3. Thomas Weise
  4. Workshop “Governance for Sustainable Development – Steering in Contexts of Ambivalence, Uncertainty and Distributed Control” - 2006
  5. International Conference of Mathematical Modelling and Applications - ICTMA 17
  6. Expertise in law: 'from above' and 'from below'
  7. PEER Group Workshop - 2013
  8. Exploring Affective Human-Robot Interaction with Movie Scenes
  9. From e-learning to the acquirement of competencies: wiki-based knowledge management and complex problem solving
  10. Life cycle thinking and systems thinking - how to support systems thinking in material flow management
  11. Source Code als Quelle: Zum Programmierwerk Friedrich Kittlers
  12. Going Green: Digital project work as a transdisciplinary and transcultural task in the foreign language and STEM classrooms
  13. Source Code als Quelle: Zum Programmierwerk Friedrich Kittlers
  14. Liquidity, Flows, Circulation: The Cultural Logic of Environmentalization (2nd part) 2021
  15. How stakeholder characteristics influence the perception and evaluation of CSR communication: a mixed-method approach to communication reception
  16. Is Transaction Cost Theory a useful Perspective for Make-and-Buy?
  17. Archive, Non-Archive, Counter-Archive
  18. Digital Teaching and Learning
  19. Rethinking Fragmentation within the Biodiversity Regime: Compliance in a post-2020 Biodiversity Framework
  20. Social perceptiveness: Its role for performance in selection procedures and for the prediction of job performance
  21. Founding Workshop of the ERC Project Principles Of Disruption - ERC 2013
  22. Sustainable Consumption - Mapping the terrain
  23. Wavelets in Technical Applications II
  24. What makes sense and what can be sensed: reconsidering the question of organization
  25. Präsidium (Organisation)

Publications

  1. Measuring cognitive load with subjective rating scales during problem solving
  2. An integrative research framework for enabling transformative adaptation
  3. Conceptions of problem solving mathematics teaching
  4. Teaching methods for modelling problems and students’ task-specific enjoyment, value, interest and self-efficacy expectations
  5. Imaginary practices as the nexus between continuity and disruptive change
  6. Restoring Causal Analysis to Structural Equation ModelingReview of Causality: Models, Reasoning, and Inference (2nd Edition), by Judea Pearl
  7. Factor structure and measurement invariance of the Students’ Self-report Checklist of Social and Learning Behaviour (SSL)
  8. Clause identification using entropy guided transformation learning
  9. Development of high performance single-phase solid solution magnesium alloy at low temperature
  10. Machine Learning and Knowledge Discovery in Databases
  11. Using EEG movement tagging to isolate brain responses coupled to biological movements
  12. An Adaptive Resonance Regulator for an Actuator using Periodic Signals in Camless Engine Systems
  13. Learning and Re-learning from net- based cooperative learning discourses
  14. Which nudges get support - A Quantitative Analysis of the Dimensions Transparency and Dual Process Theory
  15. German Utilities and Distributed PV
  16. Support vector machines with example dependent costs
  17. What Makes for a Good Theory? How to Evaluate a Theory Using the Strength Model of Self-Control as an Example
  18. Validation of Inspection Frameworks and Methods
  19. A Trajectory Generation Algorithm for Optimal Consumption in Electromagnetic Actuators
  20. Emotional text design in multimedia learning
  21. (How) Can didactic research find its way into the classroom? Results from a questionnaire survey on the lesson preparation and continuing professional development of German teachers
  22. Orchestrating distributed data governance in open social innovation