RelHunter: A machine learning method for relation extraction from text

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

We propose RelHunter, a machine learning-based method for the extraction of structured information from text. RelHunter's key idea is to model the target structures as a relation over entities. Hence, the modeling effort is reduced to the identification of entities and the generation of a candidate relation, which are simpler problems than the original one. RelHunter fits a very broad spectrum of complex computational linguistic problems. We apply it to five tasks: phrase chunking, clause identification, hedge detection, quotation extraction, and dependency parsing. We compare RelHunter to token classification approaches through several computational experiments on seven multilingual corpora. RelHunter outperforms the token classification approaches by 2.14% on average. Moreover, we compare the derived systems against state-of-the-art systems for each corpus. Our systems achieve state-of-the-art performances for three corpora: Portuguese phrase chunking, Portuguese clause identification, and English quotation extraction. Additionally, the derived systems show good quality performance for the other four corpora.

Original languageEnglish
Article number18
JournalJournal of the Brazilian Computer Society
Volume16
Issue number3
Pages (from-to)191-199
Number of pages9
ISSN0104-6500
DOIs
Publication statusPublished - 09.2010
Externally publishedYes

Bibliographical note

This work was partially funded by CNPq and FAPERJ grants 557.128/2009-9 and E-26/170028/2008. The first author holds a CNPq doctoral fellowship and is supported by Instituto Federal de Educação, Ciência e Tecnologia de Goiás, Brazil.

    Research areas

  • Entity relation extraction, Entropy Guided Transformation Learning, Machine learning, Natural language processing
  • Informatics
  • Business informatics

Recently viewed

Publications

  1. Knowledge transfer during the integration of knowledge-intensive acquisitions
  2. Key Element No. 2: Applying Diagnostic Forms of Assessment
  3. Exploring Leverages and Pitfalls of Context Collapse in Modern Communication
  4. A Genetic Algorithm for the Dynamic Management of Cellular Reconfigurable Manufacturing Systems
  5. Networking the environment
  6. Relevance of the Basset history term for Lagrangian particle dynamics
  7. Who likes to learn new things: measuring adult motivation to learn with PIAAC data from 21 countries
  8. Toward supervised anomaly detection
  9. Who commits virtual identity suicide? Differences in privacy concerns, internet addiction, and personality between facebook users and quitters
  10. Stakeholder Governance – An analysis of BITC Corporate Responsibility Index Data on Stakeholder Engagement and Governance
  11. Kickback Payments under MiFID:
  12. Process limits of extrusion of multimaterial components
  13. Trust Centrality in Online Social Networks
  14. Bush encroachment control and risk management in semi-arid rangelands
  15. Visualizing stakeholders’ willingness for collective action in participatory scenario planning
  16. Integration of Material Flow Management into Company Processes within the Automotive Industry
  17. Supportive Mental Health Self-Monitoring among Smartphone Users with Psychological Distress
  18. The emotional spectrum in traffic situations: Results of two online-studies
  19. Notting Hill Gate 4 Basic
  20. The distribution of power within the community
  21. Mediengenealogie
  22. How much does agriculture depend on pollinators?
  23. Motivation related to work
  24. Inhibition of foam cell formation using a soluble CD68-Fc fusion protein
  25. Der Raum des Cyberspace
  26. Mad speculation and absolute inhumanism
  27. Every single word