Real-time RDF extraction from unstructured data streams

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Daniel Gerber
  • Sebastian Hellmann
  • Lorenz Bühmann
  • Tommaso Soru
  • Ricardo Usbeck
  • Axel Cyrille Ngonga Ngomo

The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.

Original languageEnglish
Title of host publicationThe Semantic Web, ISWC 2013 : 12th International Semantic Web Conference, Proceedings
EditorsHarith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josiane Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, Krzyztof Janowicz
Number of pages16
PublisherSpringer Verlag
Publication date2013
Pages135-150
ISBN (print)9783642413346
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event12th International Semantic Web Conference, ISWC 2013 - Sydney Convention Centre , Sydney, NSW, Australia
Duration: 21.10.201325.10.2013
http://iswc2013.semanticweb.org

Recently viewed

Publications

  1. Is sensitivity for the complexity of mathematics teaching measurable?
  2. Introducing parametric uncertainty into a nonlinear friction model
  3. Parameterized Synthetic Image Data Set for Fisheye Lens
  4. The relationship between audit committees, external auditors, and internal control systems
  5. Introduction: The representative turn in EU Studies
  6. How do controls and trust interact?
  7. Backward Extended Kalman Filter to Estimate and Adaptively Control a PMSM in Saturation Conditions
  8. Anonymized Firm Data under Test: Evidence from a Replication Study
  9. On Software, or the Persistence of Visual Knowledge.
  10. The use of pseudo-causal narratives in EU policies
  11. Do abundance distributions and species aggregation correctly predict macroecological biodiversity patterns in tropical forests?
  12. Two Mediterranean annuals feature high within-population trait variability and respond differently to a precipitation gradient
  13. Authority and Authorship
  14. Simulation of SARS-CoV-2 pandemic in Germany with ordinary differential equations in MATLAB
  15. § 22 Level Playing Field and Sustainable Development
  16. Part based decentralized information handling for process improvements along the supply chain
  17. Introduction
  18. Interlanguage pragmatics: From use to acquisition to second language pedagogy
  19. Self-guided internet-based and mobile-based stress management for employees
  20. Quantifying circular economy pathways of decommissioned onshore wind turbines: The case of Denmark and Germany
  21. Temporary organizing and acceleration
  22. The Sustainability Balanced Scorecard as a Framework for Eco-Efficiency Analysis
  23. Curatorial Practices of the ‘Global’
  24. Scotland
  25. Employing complementary multivariate methods for a designed nontarget LC-HRMS screening of a wastewater-influenced river
  26. Income inequality, status decline and support for the radical right
  27. Reflections from five associate editors on their role in the journal and on its future directions
  28. Der dunkle Transhumanismus
  29. Consumer Online Search Behavior
  30. Local perceptions as a guide for the sustainable management of natural resources
  31. Effect of heat treatment on the microstructure and creep behavior of Mg-Sn-Ca alloys
  32. Process window and mechanical properties for thin magnesium- and zinc-wires in dieless wire drawing
  33. Organizational identity and firm growth
  34. Purpurne Zeichen
  35. Rosa
  36. Correlates of naturalization and occupancy of introduced ornamentals in Germany
  37. Filming Futures
  38. Die qualitative Analyse internetbasierter Daten
  39. "i like reggae and Bob Marley is already dead"
  40. The impact of foreign takeovers: comparative evidence from foreign and domestic acquisitions in Germany