Real-time RDF extraction from unstructured data streams

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

  • Daniel Gerber
  • Sebastian Hellmann
  • Lorenz Bühmann
  • Tommaso Soru
  • Ricardo Usbeck
  • Axel Cyrille Ngonga Ngomo

The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.

OriginalspracheEnglisch
TitelThe Semantic Web, ISWC 2013 : 12th International Semantic Web Conference, Proceedings
HerausgeberHarith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josiane Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, Krzyztof Janowicz
Anzahl der Seiten16
VerlagSpringer Verlag
Erscheinungsdatum2013
Seiten135-150
ISBN (Print)9783642413346
DOIs
PublikationsstatusErschienen - 2013
Extern publiziertJa
Veranstaltung12th International Semantic Web Conference, ISWC 2013 - Sydney Convention Centre , Sydney, NSW, Australien
Dauer: 21.10.201325.10.2013
http://iswc2013.semanticweb.org

DOI

Zuletzt angesehen

Aktivitäten

  1. Maximum-Likelihood-Based Panel Cointegration Test with Linear Time Trend
  2. Applied Econometrics with Stata for PhD Students
  3. Spas in the New Länder: A Transformation with an Uncertain Outcome
  4. Is there a threshold effect of time headway on subjective variables for different velocities?
  5. A Mixed Methods Longitudinal Design Study On Learning Results In An Innovative Study Model - First Qualitative Results In HESD
  6. Tilling the fields of knowledge in sustainability-oriented science
  7. Effects of enhanced visual feedback on postural control in static and dynamic conditions.
  8. Coauthoring an interorganizational collaboration: Exploring multi-voicedness and introducing spatiotemporal orientations
  9. Towards an Undercommons (Eco)Logistics?
  10. Navigating in the Digital Jungle: Articulating Combinatory Affordances of Digital Infrastructures for Collaboration
  11. Learning and Re-learning in Chat-based CSCL: The Impact of Individual Learning Strategies
  12. Workshop "Digital Art History: Challenges, Tools and Practical Solutions" - 2011
  13. Multimodal Networks and Generative AI and Its Applications to Visual Culture. A Critical Perspective
  14. Comparison of Two Panel Cointegration Tests
  15. The Rhetoric of Mimicry and Agon in Computer Games
  16. CSR reporting as a communication signal contributing to the corporate reputation
  17. Performance under Climatic Conditions
  18. Improving Human-Machine Interaction – A Multimodal Non-Invasive Approach to Detect Emotions in Car Drivers
  19. Comparison of Two Panel Cointegration Tests
  20. Hyper Image
  21. Documenting artistic networks: Anna Oppermann‘s Ensembles are scale free networks!
  22. Projektantrag DFG

Publikationen

  1. OKBQA framework towards an open collaboration for development of natural language question-answering systems over knowledge bases
  2. Simulation based optimization of lot sizes for opposing logistic objectives
  3. 7th open challenge on question answering over linked data (QALD-7)
  4. Evaluation of standard ERP software implementation approaches in terms of their capability for business process optimization
  5. Using transition management concepts for the evaluation of intersecting policy domains ('grand challenges')
  6. Dynamic environment modelling and prediction for autonomous systems
  7. Optimising business performance with standard software systems
  8. Learning how to request using textbooks
  9. Concepts
  10. A New Approach for Optimal Solving Cyclic and Non-Cyclic Bus Drvier Rostering Problems
  11. Value Structure and Dimensions
  12. Web-scale extension of RDF knowledge bases from templated websites
  13. Topic selection and development in learner-native speaker voice-based telecollaborative discourse
  14. Holistic and scalable ranking of RDF data
  15. Intellectual property issues in the use and distribution of remote sensing data
  16. NNARX networks on didactic level system identification
  17. HAWK - hybrid question answering using linked data
  18. Extending talk on a prescribed discussion topic in a learner-native speaker eTandem learning task
  19. Organizing Events for Configuring and Maintaining Creative Fields
  20. Model Based Logistic Monitoring of Assembly Areas
  21. Geometric structures using model predictive control for an electromagnetic actuator
  22. Towards an open question answering architecture
  23. Multilingual disambiguation of named entities using linked data
  24. Mapping Complexity in Environmental Governance
  25. ASSESS — automatic self-assessment using linked data
  26. Gerbil – Benchmarking named entity recognition and linking consistently
  27. GENESIS - A generic RDF data access interface