Real-time RDF extraction from unstructured data streams

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Daniel Gerber
  • Sebastian Hellmann
  • Lorenz Bühmann
  • Tommaso Soru
  • Ricardo Usbeck
  • Axel Cyrille Ngonga Ngomo

The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.

Original languageEnglish
Title of host publicationThe Semantic Web, ISWC 2013 : 12th International Semantic Web Conference, Proceedings
EditorsHarith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josiane Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, Krzyztof Janowicz
Number of pages16
PublisherSpringer Verlag
Publication date2013
Pages135-150
ISBN (print)9783642413346
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event12th International Semantic Web Conference, ISWC 2013 - Sydney Convention Centre , Sydney, NSW, Australia
Duration: 21.10.201325.10.2013
http://iswc2013.semanticweb.org

Recently viewed

Publications

  1. Using learning protocols for knowledge acquisition and problem solving with individual and group incentives
  2. Using Fuzzy PD Controllers for Soft Motions in a Car-like Robot
  3. The fuzzy relationship of intelligence and problem solving in computer simulations
  4. Probabilistic approach to modelling of recession curves
  5. Constructs for Assessing Integrated Reports-Testing the Predictive Validity of a Taxonomy for Organization Size, Industry, and Performance
  6. Trajectory-based computational study of coherent behavior in flows
  7. Conceptualizing Role Development in Agile Transformations
  8. PLM ‑supported automated process planning and partitioning for collaborative assembly processes based on a capability analysis
  9. »HOW TO MAKE YOUR OWN SAMPLES«
  10. Fast, Fully Automated Analysis of Voriconazole from Serum by LC-LC-ESI-MS-MS with Parallel Column-Switching Technique
  11. Kalman Filter for Predictive Maintenance and Anomaly Detection
  12. How does Enterprise Architecture support the Design and Realization of Data-Driven Business Models?
  13. Modeling of lateness distributions depending on the sequencing method with respect to productivity effects
  14. Understanding the properties of isospectral points and pairs in graphs
  15. Stimulating Computing
  16. Integration of laser scanning and projection speckle pattern for advanced pipeline monitoring
  17. Study on the effects of tool design and process parameters on the robustness of deep drawing
  18. Sliding-Mode-Based Input-Output Linearization of a Peltier Element for Ice Clamping Using a State and Disturbance Observer
  19. Dimension estimates for certain sets of infinite complex continued fractions
  20. Applied quality assurance methods under the open source development model
  21. Understanding Low-Code Evolution, Adoption and Ecosystem for Software Development
  22. Visualization of the Plasma Frequency by means of a Particle Simulation using a Normalized Periodic Model
  23. Exploring priority effects in a central European grassland field experiment in order to inform restoration
  24. The professional context as a predictor for response distortion in the Adaption-Innovation-Inventory – An investigation using mixture-distribution item-response theory models
  25. Finding Datasets in Publications: The University of Paderborn Approach
  26. Modelling, Simulation and Experimental Analysis of a Metal-Polymer Hybrid Fibre based Microstrip Resonator for High Frequency Characterisation
  27. Scholarly Question Answering Using Large Language Models in the NFDI4DataScience Gateway
  28. Metaphors and Paradigms of the Language Animal—or—The Advantage of seeing “Time Is a Resource” as a Paradigm
  29. Design of an Information-Based Distributed Production Planning System
  30. Developing a sustainable platform for entity annotation benchmarks