Latent trees for coreference resolution
Research output: Journal contributions › Journal articles › Research › peer-review
Standard
In: Computational Linguistics, Vol. 40, No. 4, 19.12.2014, p. 801-835.
Research output: Journal contributions › Journal articles › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - JOUR
T1 - Latent trees for coreference resolution
AU - Fernandes, Eraldo Rezende
AU - dos Santos, Cícero Nogueira
AU - Milidiú, Ruy Luiz
PY - 2014/12/19
Y1 - 2014/12/19
N2 - We describe a structure learning system for unrestricted coreference resolution that explores two key modeling techniques: latent coreference trees and automatic entropy-guided feature induction. The latent tree modeling makes the learning problem computationally feasible because it incorporates a meaningful hidden structure. Additionally, using an automatic feature induction method, we can efficiently build enhanced nonlinear models using linear model learning algorithms. We present empirical results that highlight the contribution of each modeling technique used in the proposed system. Empirical evaluation is performed on the multilingual unrestricted coreference CoNLL-2012 Shared Task data sets, which comprise three languages: Arabic, Chinese, and English. We apply the same system to all languages, except for minor adaptations to some language-dependent features such as nested mentions and specific static pronoun lists. A previous version of this system was submitted to the CoNLL-2012 Shared Task closed track, achieving an official score of 58:69, the best among the competitors. The unique enhancement added to the current system version is the inclusion of candidate arcs linking nested mentions for the Chinese language. By including such arcs, the score increases by almost 4.5 points for that language. The current system shows a score of 60:15, which corresponds to a 3:5% error reduction, and is the best performing system for each of the three languages.
AB - We describe a structure learning system for unrestricted coreference resolution that explores two key modeling techniques: latent coreference trees and automatic entropy-guided feature induction. The latent tree modeling makes the learning problem computationally feasible because it incorporates a meaningful hidden structure. Additionally, using an automatic feature induction method, we can efficiently build enhanced nonlinear models using linear model learning algorithms. We present empirical results that highlight the contribution of each modeling technique used in the proposed system. Empirical evaluation is performed on the multilingual unrestricted coreference CoNLL-2012 Shared Task data sets, which comprise three languages: Arabic, Chinese, and English. We apply the same system to all languages, except for minor adaptations to some language-dependent features such as nested mentions and specific static pronoun lists. A previous version of this system was submitted to the CoNLL-2012 Shared Task closed track, achieving an official score of 58:69, the best among the competitors. The unique enhancement added to the current system version is the inclusion of candidate arcs linking nested mentions for the Chinese language. By including such arcs, the score increases by almost 4.5 points for that language. The current system shows a score of 60:15, which corresponds to a 3:5% error reduction, and is the best performing system for each of the three languages.
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=84918531827&partnerID=8YFLogxK
U2 - 10.1162/COLI_a_00200
DO - 10.1162/COLI_a_00200
M3 - Journal articles
AN - SCOPUS:84918531827
VL - 40
SP - 801
EP - 835
JO - Computational Linguistics
JF - Computational Linguistics
SN - 0891-2017
IS - 4
ER -