RelHunter: A machine learning method for relation extraction from text

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

We propose RelHunter, a machine learning-based method for the extraction of structured information from text. RelHunter's key idea is to model the target structures as a relation over entities. Hence, the modeling effort is reduced to the identification of entities and the generation of a candidate relation, which are simpler problems than the original one. RelHunter fits a very broad spectrum of complex computational linguistic problems. We apply it to five tasks: phrase chunking, clause identification, hedge detection, quotation extraction, and dependency parsing. We compare RelHunter to token classification approaches through several computational experiments on seven multilingual corpora. RelHunter outperforms the token classification approaches by 2.14% on average. Moreover, we compare the derived systems against state-of-the-art systems for each corpus. Our systems achieve state-of-the-art performances for three corpora: Portuguese phrase chunking, Portuguese clause identification, and English quotation extraction. Additionally, the derived systems show good quality performance for the other four corpora.

Original languageEnglish
Article number18
JournalJournal of the Brazilian Computer Society
Volume16
Issue number3
Pages (from-to)191-199
Number of pages9
ISSN0104-6500
DOIs
Publication statusPublished - 09.2010
Externally publishedYes

Bibliographical note

This work was partially funded by CNPq and FAPERJ grants 557.128/2009-9 and E-26/170028/2008. The first author holds a CNPq doctoral fellowship and is supported by Instituto Federal de Educação, Ciência e Tecnologia de Goiás, Brazil.

    Research areas

  • Entity relation extraction, Entropy Guided Transformation Learning, Machine learning, Natural language processing
  • Informatics
  • Business informatics

Recently viewed

Publications

  1. Using measures of reading time regularity (RTR) to quantify eye movement dynamics, and how they are shaped by linguistic information
  2. Exploding Images
  3. Embedding Evidence on Conservation Interventions Within a Context of Multilevel Governance
  4. Discussion report part 2
  5. Priority Rule-based Planning Approaches for Regeneration Processes
  6. Action rate models for predicting actions in soccer
  7. Optimal control strategies for PMSM with a decoupling super twisting SMC and inductance estimation in the presence of saturation
  8. Qualitative Daten computergestutzt auswerten
  9. Phosphorus uptake from struvite is modulated by the nitrogen form applied
  10. Implementation of formative assessment
  11. Sustainable Consumption - Mapping the Terrain
  12. Interactive sequential generative models for team sports
  13. Modellieren in der Sekundarstufe
  14. Interplay of formative assessment and instructional quality—interactive effects on students’ mathematics achievement
  15. Comparative study on corrosion behavior of we33 in immersion and polarization influenced by heat treatment
  16. Methods in Writing Process Research
  17. Evidence for singlet state β cleavage in the photoreaction of α-(2,6-dimethoxyphenoxy)-acetophenone inferred from time-resolved CIDNP spectroscopy
  18. Identifying determinants of teachers' judgment (in)accuracy regarding students' school-related motivations using a Bayesian cross-classified multi-level model
  19. An Equation with many Variables
  20. Comparative study on the dehydrogenation properties of TiCl4-doped LiAlH4 using different doping techniques
  21. Why Fun Matters: In Search of Emergent Playful Experiences
  22. A highly transparent method of assessing the contribution of incentives to meet various technical challenges in distributed energy systems
  23. Properties of some overlapping self-similar and some self-affine measures
  24. Towards a Comprehensive Framework for Environmental Management Accounting