Real-time RDF extraction from unstructured data streams

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Real-time RDF extraction from unstructured data streams. / Gerber, Daniel; Hellmann, Sebastian; Bühmann, Lorenz et al.

The Semantic Web, ISWC 2013: 12th International Semantic Web Conference, Proceedings. ed. / Harith Alani; Lalana Kagal; Achille Fokoue; Paul Groth; Chris Biemann; Josiane Xavier Parreira; Lora Aroyo; Natasha Noy; Chris Welty; Krzyztof Janowicz. Springer, 2013. p. 135-150 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8218 LNCS, No. PART 1).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Gerber, D, Hellmann, S, Bühmann, L, Soru, T, Usbeck, R & Ngonga Ngomo, AC 2013, Real-time RDF extraction from unstructured data streams. in H Alani, L Kagal, A Fokoue, P Groth, C Biemann, JX Parreira, L Aroyo, N Noy, C Welty & K Janowicz (eds), The Semantic Web, ISWC 2013: 12th International Semantic Web Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 8218 LNCS, Springer, pp. 135-150, 12th International Semantic Web Conference, ISWC 2013, Sydney, NSW, New South Wales, Australia, 21.10.13. https://doi.org/10.1007/978-3-642-41335-3_9

APA

Gerber, D., Hellmann, S., Bühmann, L., Soru, T., Usbeck, R., & Ngonga Ngomo, A. C. (2013). Real-time RDF extraction from unstructured data streams. In H. Alani, L. Kagal, A. Fokoue, P. Groth, C. Biemann, J. X. Parreira, L. Aroyo, N. Noy, C. Welty, & K. Janowicz (Eds.), The Semantic Web, ISWC 2013: 12th International Semantic Web Conference, Proceedings (pp. 135-150). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8218 LNCS, No. PART 1). Springer. https://doi.org/10.1007/978-3-642-41335-3_9

Vancouver

Gerber D, Hellmann S, Bühmann L, Soru T, Usbeck R, Ngonga Ngomo AC. Real-time RDF extraction from unstructured data streams. In Alani H, Kagal L, Fokoue A, Groth P, Biemann C, Parreira JX, Aroyo L, Noy N, Welty C, Janowicz K, editors, The Semantic Web, ISWC 2013: 12th International Semantic Web Conference, Proceedings. Springer. 2013. p. 135-150. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). doi: 10.1007/978-3-642-41335-3_9

Bibtex

@inbook{bd4458c832904167a7a7c449e3f0beb6,
title = "Real-time RDF extraction from unstructured data streams",
abstract = "The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.",
keywords = "Informatics, Time Slice, Name Entry Recognition, Pattern Mapping, Link Open Data, String Similarity, Business informatics",
author = "Daniel Gerber and Sebastian Hellmann and Lorenz B{\"u}hmann and Tommaso Soru and Ricardo Usbeck and {Ngonga Ngomo}, {Axel Cyrille}",
year = "2013",
doi = "10.1007/978-3-642-41335-3_9",
language = "English",
isbn = "9783642413346",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
number = "PART 1",
pages = "135--150",
editor = "Harith Alani and Lalana Kagal and Achille Fokoue and Paul Groth and Chris Biemann and Parreira, {Josiane Xavier} and Lora Aroyo and Natasha Noy and Chris Welty and Krzyztof Janowicz",
booktitle = "The Semantic Web, ISWC 2013",
address = "Germany",
note = "12th International Semantic Web Conference, ISWC 2013 ; Conference date: 21-10-2013 Through 25-10-2013",
url = "http://iswc2013.semanticweb.org",

}

RIS

TY - CHAP

T1 - Real-time RDF extraction from unstructured data streams

AU - Gerber, Daniel

AU - Hellmann, Sebastian

AU - Bühmann, Lorenz

AU - Soru, Tommaso

AU - Usbeck, Ricardo

AU - Ngonga Ngomo, Axel Cyrille

PY - 2013

Y1 - 2013

N2 - The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.

AB - The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.

KW - Informatics

KW - Time Slice

KW - Name Entry Recognition

KW - Pattern Mapping

KW - Link Open Data

KW - String Similarity

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=84891950965&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/5d304550-be6f-361f-8bc5-05940fd2117e/

U2 - 10.1007/978-3-642-41335-3_9

DO - 10.1007/978-3-642-41335-3_9

M3 - Article in conference proceedings

AN - SCOPUS:84891950965

SN - 9783642413346

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 135

EP - 150

BT - The Semantic Web, ISWC 2013

A2 - Alani, Harith

A2 - Kagal, Lalana

A2 - Fokoue, Achille

A2 - Groth, Paul

A2 - Biemann, Chris

A2 - Parreira, Josiane Xavier

A2 - Aroyo, Lora

A2 - Noy, Natasha

A2 - Welty, Chris

A2 - Janowicz, Krzyztof

PB - Springer

T2 - 12th International Semantic Web Conference, ISWC 2013

Y2 - 21 October 2013 through 25 October 2013

ER -