RelHunter: A machine learning method for relation extraction from text

Eraldo R. Fernandes; Ruy L. Milidiú; Raúl P. Rentería

doi:10.1007/s13173-010-0018-y

RelHunter: A machine learning method for relation extraction from text

Research output: Journal contributions › Journal articles › Research › peer-review

Standard

RelHunter: A machine learning method for relation extraction from text. / Fernandes, Eraldo R.; Milidiú, Ruy L.; Rentería, Raúl P.
In: Journal of the Brazilian Computer Society, Vol. 16, No. 3, 18, 09.2010, p. 191-199.

Research output: Journal contributions › Journal articles › Research › peer-review

Bibtex

@article{631a377ca8674819a40caf7adcc40e4a,

title = "RelHunter: A machine learning method for relation extraction from text",

abstract = "We propose RelHunter, a machine learning-based method for the extraction of structured information from text. RelHunter's key idea is to model the target structures as a relation over entities. Hence, the modeling effort is reduced to the identification of entities and the generation of a candidate relation, which are simpler problems than the original one. RelHunter fits a very broad spectrum of complex computational linguistic problems. We apply it to five tasks: phrase chunking, clause identification, hedge detection, quotation extraction, and dependency parsing. We compare RelHunter to token classification approaches through several computational experiments on seven multilingual corpora. RelHunter outperforms the token classification approaches by 2.14% on average. Moreover, we compare the derived systems against state-of-the-art systems for each corpus. Our systems achieve state-of-the-art performances for three corpora: Portuguese phrase chunking, Portuguese clause identification, and English quotation extraction. Additionally, the derived systems show good quality performance for the other four corpora.",

keywords = "Entity relation extraction, Entropy Guided Transformation Learning, Machine learning, Natural language processing, Informatics, Business informatics",

author = "Fernandes, {Eraldo R.} and Milidi{\'u}, {Ruy L.} and Renter{\'i}a, {Ra{\'u}l P.}",

note = "This work was partially funded by CNPq and FAPERJ grants 557.128/2009-9 and E-26/170028/2008. The first author holds a CNPq doctoral fellowship and is supported by Instituto Federal de Educa{\c c}{\~a}o, Ci{\^e}ncia e Tecnologia de Goi{\'a}s, Brazil.",

year = "2010",

month = sep,

doi = "10.1007/s13173-010-0018-y",

language = "English",

volume = "16",

pages = "191--199",

journal = "Journal of the Brazilian Computer Society",

issn = "0104-6500",

publisher = "Brazilian Computing Society",

number = "3",

}

RIS

TY - JOUR

T1 - RelHunter

T2 - A machine learning method for relation extraction from text

AU - Fernandes, Eraldo R.

AU - Milidiú, Ruy L.

AU - Rentería, Raúl P.

N1 - This work was partially funded by CNPq and FAPERJ grants 557.128/2009-9 and E-26/170028/2008. The first author holds a CNPq doctoral fellowship and is supported by Instituto Federal de Educação, Ciência e Tecnologia de Goiás, Brazil.

PY - 2010/9

Y1 - 2010/9

N2 - We propose RelHunter, a machine learning-based method for the extraction of structured information from text. RelHunter's key idea is to model the target structures as a relation over entities. Hence, the modeling effort is reduced to the identification of entities and the generation of a candidate relation, which are simpler problems than the original one. RelHunter fits a very broad spectrum of complex computational linguistic problems. We apply it to five tasks: phrase chunking, clause identification, hedge detection, quotation extraction, and dependency parsing. We compare RelHunter to token classification approaches through several computational experiments on seven multilingual corpora. RelHunter outperforms the token classification approaches by 2.14% on average. Moreover, we compare the derived systems against state-of-the-art systems for each corpus. Our systems achieve state-of-the-art performances for three corpora: Portuguese phrase chunking, Portuguese clause identification, and English quotation extraction. Additionally, the derived systems show good quality performance for the other four corpora.

AB - We propose RelHunter, a machine learning-based method for the extraction of structured information from text. RelHunter's key idea is to model the target structures as a relation over entities. Hence, the modeling effort is reduced to the identification of entities and the generation of a candidate relation, which are simpler problems than the original one. RelHunter fits a very broad spectrum of complex computational linguistic problems. We apply it to five tasks: phrase chunking, clause identification, hedge detection, quotation extraction, and dependency parsing. We compare RelHunter to token classification approaches through several computational experiments on seven multilingual corpora. RelHunter outperforms the token classification approaches by 2.14% on average. Moreover, we compare the derived systems against state-of-the-art systems for each corpus. Our systems achieve state-of-the-art performances for three corpora: Portuguese phrase chunking, Portuguese clause identification, and English quotation extraction. Additionally, the derived systems show good quality performance for the other four corpora.

KW - Entity relation extraction

KW - Entropy Guided Transformation Learning

KW - Machine learning

KW - Natural language processing

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=84870389863&partnerID=8YFLogxK

UR - https://link.springer.com/journal/13173/volumes-and-issues/16-3

U2 - 10.1007/s13173-010-0018-y

DO - 10.1007/s13173-010-0018-y

M3 - Journal articles

AN - SCOPUS:84870389863

VL - 16

SP - 191

EP - 199

JO - Journal of the Brazilian Computer Society

JF - Journal of the Brazilian Computer Society

SN - 0104-6500

IS - 3

M1 - 18

ER -

Other publications by the same author(s)

Data practices in apps from Brazil: What do privacy policies inform us about?

Quadros dos Reis, V., Rabello, M. E. R., Lima, A. C., Jardim, G. P. S., Fernandes, E. R. & Brefeld, U., 10.02.2023, In: Journal on Interactive Systems. 14, 1, p. 1-8 8 p.

Research output: Journal contributions › Journal articles › Research › peer-review

Entity Extraction from Portuguese Legal Documents Using Distant Supervision

Navarezi, L. M., Sakiyama, K., Rodrigues, L. S., Robaldo, C. M. O., Lobato, G. R., Vilela, P. A., Matsubara, E. T. & Fernandes, E. R., 2022, Computational Processing of the Portuguese Language : 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21-23, 2022, Proceedings. Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D., Magro, C. & Pinto, H. (eds.). Cham: Springer Nature Switzerland AG, p. 166-176 11 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 13208 LNAI).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

FaST: A linear time stack trace alignment heuristic for crash report deduplication

Rodrigues, I. M., Aloise, D. & Fernandes, E. R., 17.10.2022, The 2022 Mining Software Repositories Conference: MSR 2022, Proceedings; 18-20 May 2022, Virtual; 23-24 May 2022, Pittsburgh, Pennsylvania. New York: Institute of Electrical and Electronics Engineers Inc., p. 549-560 12 p. (Proceedings - IEEE/ACM International Conference on Mining Software Repositories ).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Performance predictors for graphics processing units applied to dark-silicon-aware design space exploration

Sonohata, R., Arigoni, D. C. A., Fernandes, E. R., Ribeiro dos Santos, R. & Dessandre Duenha, L., 01.08.2023, In: Concurrency and Computation: Practice and Experience. 35, 17, 16 p., e6877.

Research output: Journal contributions › Journal articles › Research › peer-review

TraceSim: An Alignment Method for Computing Stack Trace Similarity

Rodrigues, I. M., Khvorov, A., Aloise, D., Vasiliev, R., Koznov, D., Fernandes, E. R., Chernishev, G., Luciv, D. & Povarov, N., 01.03.2022, In: Empirical Software Engineering. 27, 2, 41 p., 53.

Research output: Journal contributions › Journal articles › Research › peer-review

DOI

https://doi.org/10.1007/s13173-010-0018-y
Final published version

RelHunter: A machine learning method for relation extraction from text

Standard

Harvard

APA

Vancouver

Bibtex

RIS

Other publications by the same author(s)

Data practices in apps from Brazil: What do privacy policies inform us about?

Entity Extraction from Portuguese Legal Documents Using Distant Supervision

FaST: A linear time stack trace alignment heuristic for crash report deduplication

Performance predictors for graphics processing units applied to dark-silicon-aware design space exploration

TraceSim: An Alignment Method for Computing Stack Trace Similarity

DOI

Recently viewed

Publications

Press / Media