Real-time RDF extraction from unstructured data streams

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Daniel Gerber
  • Sebastian Hellmann
  • Lorenz Bühmann
  • Tommaso Soru
  • Ricardo Usbeck
  • Axel Cyrille Ngonga Ngomo

The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.

Original languageEnglish
Title of host publicationThe Semantic Web, ISWC 2013 : 12th International Semantic Web Conference, Proceedings
EditorsHarith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josiane Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, Krzyztof Janowicz
Number of pages16
PublisherSpringer Verlag
Publication date2013
Pages135-150
ISBN (print)9783642413346
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event12th International Semantic Web Conference, ISWC 2013 - Sydney Convention Centre , Sydney, NSW, Australia
Duration: 21.10.201325.10.2013
http://iswc2013.semanticweb.org

Recently viewed

Publications

  1. Mining positional data streams
  2. Dividing Apples and Pears: Towards a Taxonomy for Agile Transformation
  3. Orchestrating distributed data governance in open social innovation
  4. Entity linking in 40 languages using MAG
  5. Internet of things and process performance improvements in manufacturing
  6. A direct test of the similarity assumption — Focusing on differences as compared with similarities decreases automatic imitation
  7. Interaction-Dominant Causation in Mind and Brain, and Its Implication for Questions of Generalization and Replication
  8. Autonomy of Migration Despite Its Securitisation? Facing the Terms and Conditions of Biometric Rebordering
  9. Firm size and the use of export intermediaries.
  10. Gluing life together. Computer simulation in the life sciences
  11. Assessment of university students’ understanding of abstract binary operations
  12. Knowledge Graph Question Answering Datasets and Their Generalizability
  13. Land-use legacy and tree age in continuous woodlands
  14. Take the money and run? Implementation and disclosure of environmentally-oriented crowdfunding projects
  15. Adaptive Environments
  16. Does attention speed up processing?
  17. Short-arc measurement and fitting based on the bidirectional prediction of observed data
  18. Experimental analysis of measurement process for a QCM using the pulse coincidence method
  19. Worse is worse and better doesn't matter?
  20. Modelling scenarios to identify a combined sediment-water management strategy for the large reservoirs of the Tuyamuyun hydro-complex
  21. End-to-End Active Speaker Detection
  22. Drafts in Action
  23. Self-Regulated Learning with Expository Texts as a Competence
  24. Proposing a social-ecological framework for successful grassland restoration in Germany—an overview and insights from the Grassworks project
  25. The Influence of Terrorism on Expatriate Performance: a Conceptual Approach