Real-time RDF extraction from unstructured data streams

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Real-time RDF extraction from unstructured data streams. / Gerber, Daniel; Hellmann, Sebastian; Bühmann, Lorenz et al.
The Semantic Web, ISWC 2013: 12th International Semantic Web Conference, Proceedings. ed. / Harith Alani; Lalana Kagal; Achille Fokoue; Paul Groth; Chris Biemann; Josiane Xavier Parreira; Lora Aroyo; Natasha Noy; Chris Welty; Krzyztof Janowicz. Springer Verlag, 2013. p. 135-150 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8218 LNCS, No. PART 1).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Gerber, D, Hellmann, S, Bühmann, L, Soru, T, Usbeck, R & Ngonga Ngomo, AC 2013, Real-time RDF extraction from unstructured data streams. in H Alani, L Kagal, A Fokoue, P Groth, C Biemann, JX Parreira, L Aroyo, N Noy, C Welty & K Janowicz (eds), The Semantic Web, ISWC 2013: 12th International Semantic Web Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 8218 LNCS, Springer Verlag, pp. 135-150, 12th International Semantic Web Conference, ISWC 2013, Sydney, NSW, New South Wales, Australia, 21.10.13. https://doi.org/10.1007/978-3-642-41335-3_9

APA

Gerber, D., Hellmann, S., Bühmann, L., Soru, T., Usbeck, R., & Ngonga Ngomo, A. C. (2013). Real-time RDF extraction from unstructured data streams. In H. Alani, L. Kagal, A. Fokoue, P. Groth, C. Biemann, J. X. Parreira, L. Aroyo, N. Noy, C. Welty, & K. Janowicz (Eds.), The Semantic Web, ISWC 2013: 12th International Semantic Web Conference, Proceedings (pp. 135-150). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8218 LNCS, No. PART 1). Springer Verlag. https://doi.org/10.1007/978-3-642-41335-3_9

Vancouver

Gerber D, Hellmann S, Bühmann L, Soru T, Usbeck R, Ngonga Ngomo AC. Real-time RDF extraction from unstructured data streams. In Alani H, Kagal L, Fokoue A, Groth P, Biemann C, Parreira JX, Aroyo L, Noy N, Welty C, Janowicz K, editors, The Semantic Web, ISWC 2013: 12th International Semantic Web Conference, Proceedings. Springer Verlag. 2013. p. 135-150. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). doi: 10.1007/978-3-642-41335-3_9

Bibtex

@inbook{bd4458c832904167a7a7c449e3f0beb6,
title = "Real-time RDF extraction from unstructured data streams",
abstract = "The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.",
keywords = "Informatics, Time Slice, Name Entry Recognition, Pattern Mapping, Link Open Data, String Similarity, Business informatics",
author = "Daniel Gerber and Sebastian Hellmann and Lorenz B{\"u}hmann and Tommaso Soru and Ricardo Usbeck and {Ngonga Ngomo}, {Axel Cyrille}",
year = "2013",
doi = "10.1007/978-3-642-41335-3_9",
language = "English",
isbn = "9783642413346",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
number = "PART 1",
pages = "135--150",
editor = "Harith Alani and Lalana Kagal and Achille Fokoue and Paul Groth and Chris Biemann and Parreira, {Josiane Xavier} and Lora Aroyo and Natasha Noy and Chris Welty and Krzyztof Janowicz",
booktitle = "The Semantic Web, ISWC 2013",
address = "Germany",
note = "12th International Semantic Web Conference, ISWC 2013 ; Conference date: 21-10-2013 Through 25-10-2013",
url = "http://iswc2013.semanticweb.org",

}

RIS

TY - CHAP

T1 - Real-time RDF extraction from unstructured data streams

AU - Gerber, Daniel

AU - Hellmann, Sebastian

AU - Bühmann, Lorenz

AU - Soru, Tommaso

AU - Usbeck, Ricardo

AU - Ngonga Ngomo, Axel Cyrille

PY - 2013

Y1 - 2013

N2 - The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.

AB - The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.

KW - Informatics

KW - Time Slice

KW - Name Entry Recognition

KW - Pattern Mapping

KW - Link Open Data

KW - String Similarity

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=84891950965&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/5d304550-be6f-361f-8bc5-05940fd2117e/

U2 - 10.1007/978-3-642-41335-3_9

DO - 10.1007/978-3-642-41335-3_9

M3 - Article in conference proceedings

AN - SCOPUS:84891950965

SN - 9783642413346

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 135

EP - 150

BT - The Semantic Web, ISWC 2013

A2 - Alani, Harith

A2 - Kagal, Lalana

A2 - Fokoue, Achille

A2 - Groth, Paul

A2 - Biemann, Chris

A2 - Parreira, Josiane Xavier

A2 - Aroyo, Lora

A2 - Noy, Natasha

A2 - Welty, Chris

A2 - Janowicz, Krzyztof

PB - Springer Verlag

T2 - 12th International Semantic Web Conference, ISWC 2013

Y2 - 21 October 2013 through 25 October 2013

ER -

Recently viewed

Publications

  1. Classical PI Controllers with Anti-Windup Techniques Applied on Level Systems
  2. Lagged Multidimensional Recurrence Quantification Analysis for Determining Leader–Follower Relationships Within Multidimensional Time Series
  3. Effective informational entropy reduction in multi-robot systems based on real-time TVS
  4. Positioning Improvement for a Laser Scanning System using cSORPD control
  5. Improved sensorimotor control is not connected with improved proprioception
  6. Machine Learning and Knowledge Discovery in Databases
  7. Competing Vegetation Structure Indices for Estimating Spatial Constrains in Carabid Abundance Patterns in Chinese Grasslands Reveal Complex Scale and Habitat Patterns
  8. Advances in Dynamics, Optimization and Computation
  9. Does thinking-aloud affect learning, visual information processing and cognitive load when learning with seductive details as expected from self-regulation perspective?
  10. Cognitive load and instructionally supported learning with provided and learner-generated visualizations
  11. Resource extraction technologies - is a more responsible path of development possible?
  12. Using augmented video to test in-car user experiences of context analog HUDs
  13. Robust Control of Mobile Transportation Object with 3D Technical Vision System
  14. How Much Home Office is Ideal? A Multi-Perspective Algorithm
  15. Efficient Order Picking Methods in Robotic Mobile Fulfillment Systems
  16. Guided discovery learning with computer-based simulation games
  17. Probabilistic approach to modelling of recession curves
  18. Eighth Workshop on Mining and Learning with Graphs
  19. Mostly harmless econometrics? Statistical paradigms in the ‘top five’ from 2000 to 2018
  20. Learning and Re-learning from net- based cooperative learning discourses
  21. Using Heider’s Epistemology of Thing and Medium for Unpacking the Conception of Documents: Gantt Charts and Boundary Objects
  22. Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics
  23. Transfer operator-based extraction of coherent features on surfaces
  24. Optimising business performance with standard software systems
  25. Reality-Based Tasks with Complex-Situations
  26. On the Difficulty of Forgetting
  27. An experience-based learning framework
  28. Soft Skills for Hard Constraints
  29. Should learners use their hands for learning? Results from an eye-tracking study
  30. Influence of Process Parameters and Die Design on the Microstructure and Texture Development of Direct Extruded Magnesium Flat Products
  31. Introduction Mobile Digital Practices. Situating People, Things, and Data
  32. Integrating the underlying structure of stochasticity into community ecology