RelHunter: A machine learning method for relation extraction from text
Research output: Journal contributions › Journal articles › Research › peer-review
Standard
In: Journal of the Brazilian Computer Society, Vol. 16, No. 3, 18, 09.2010, p. 191-199.
Research output: Journal contributions › Journal articles › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - JOUR
T1 - RelHunter
T2 - A machine learning method for relation extraction from text
AU - Fernandes, Eraldo R.
AU - Milidiú, Ruy L.
AU - Rentería, Raúl P.
N1 - This work was partially funded by CNPq and FAPERJ grants 557.128/2009-9 and E-26/170028/2008. The first author holds a CNPq doctoral fellowship and is supported by Instituto Federal de Educação, Ciência e Tecnologia de Goiás, Brazil.
PY - 2010/9
Y1 - 2010/9
N2 - We propose RelHunter, a machine learning-based method for the extraction of structured information from text. RelHunter's key idea is to model the target structures as a relation over entities. Hence, the modeling effort is reduced to the identification of entities and the generation of a candidate relation, which are simpler problems than the original one. RelHunter fits a very broad spectrum of complex computational linguistic problems. We apply it to five tasks: phrase chunking, clause identification, hedge detection, quotation extraction, and dependency parsing. We compare RelHunter to token classification approaches through several computational experiments on seven multilingual corpora. RelHunter outperforms the token classification approaches by 2.14% on average. Moreover, we compare the derived systems against state-of-the-art systems for each corpus. Our systems achieve state-of-the-art performances for three corpora: Portuguese phrase chunking, Portuguese clause identification, and English quotation extraction. Additionally, the derived systems show good quality performance for the other four corpora.
AB - We propose RelHunter, a machine learning-based method for the extraction of structured information from text. RelHunter's key idea is to model the target structures as a relation over entities. Hence, the modeling effort is reduced to the identification of entities and the generation of a candidate relation, which are simpler problems than the original one. RelHunter fits a very broad spectrum of complex computational linguistic problems. We apply it to five tasks: phrase chunking, clause identification, hedge detection, quotation extraction, and dependency parsing. We compare RelHunter to token classification approaches through several computational experiments on seven multilingual corpora. RelHunter outperforms the token classification approaches by 2.14% on average. Moreover, we compare the derived systems against state-of-the-art systems for each corpus. Our systems achieve state-of-the-art performances for three corpora: Portuguese phrase chunking, Portuguese clause identification, and English quotation extraction. Additionally, the derived systems show good quality performance for the other four corpora.
KW - Entity relation extraction
KW - Entropy Guided Transformation Learning
KW - Machine learning
KW - Natural language processing
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=84870389863&partnerID=8YFLogxK
UR - https://link.springer.com/journal/13173/volumes-and-issues/16-3
U2 - 10.1007/s13173-010-0018-y
DO - 10.1007/s13173-010-0018-y
M3 - Journal articles
AN - SCOPUS:84870389863
VL - 16
SP - 191
EP - 199
JO - Journal of the Brazilian Computer Society
JF - Journal of the Brazilian Computer Society
SN - 0104-6500
IS - 3
M1 - 18
ER -