Entropy-guided feature generation for structured learning of Portuguese dependency parsing

Eraldo R. Fernandes; Ruy L. Milidiú

doi:10.1007/978-3-642-28885-2_17

Entropy-guided feature generation for structured learning of Portuguese dependency parsing

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Standard

Entropy-guided feature generation for structured learning of Portuguese dependency parsing. / Fernandes, Eraldo R.; Milidiú, Ruy L.
Computational Processing of the Portuguese Language: 10th International Conference, PROPOR 2012, Coimbra, Portugal, April 17-20, 2012. Proceedings. ed. / Helena Caseli; Aline Villavicencio; Antonio Teixeira; Fernando Perdigao. Berlin, Heidelberg: Springer Verlag, 2012. p. 146-156 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7243 LNAI).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Harvard

Fernandes, ER & Milidiú, RL 2012, Entropy-guided feature generation for structured learning of Portuguese dependency parsing. in H Caseli, A Villavicencio, A Teixeira & F Perdigao (eds), Computational Processing of the Portuguese Language: 10th International Conference, PROPOR 2012, Coimbra, Portugal, April 17-20, 2012. Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7243 LNAI, Springer Verlag, Berlin, Heidelberg, pp. 146-156, International Conference on Computational Processing of Portuguese, Coimbra, Portugal, 17.04.12. https://doi.org/10.1007/978-3-642-28885-2_17

APA

Fernandes, E. R., & Milidiú, R. L. (2012). Entropy-guided feature generation for structured learning of Portuguese dependency parsing. In H. Caseli, A. Villavicencio, A. Teixeira, & F. Perdigao (Eds.), Computational Processing of the Portuguese Language: 10th International Conference, PROPOR 2012, Coimbra, Portugal, April 17-20, 2012. Proceedings (pp. 146-156). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7243 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-642-28885-2_17

Vancouver

Fernandes ER, Milidiú RL. Entropy-guided feature generation for structured learning of Portuguese dependency parsing. In Caseli H, Villavicencio A, Teixeira A, Perdigao F, editors, Computational Processing of the Portuguese Language: 10th International Conference, PROPOR 2012, Coimbra, Portugal, April 17-20, 2012. Proceedings. Berlin, Heidelberg: Springer Verlag. 2012. p. 146-156. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-28885-2_17

Bibtex

@inbook{f536d12ded6d493bb6a28d5057fd8d2a,

title = "Entropy-guided feature generation for structured learning of Portuguese dependency parsing",

abstract = "Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.",

keywords = "dependency parsing, entropy-guided feature generation, machine learning, structured learning, Informatics, Business informatics",

author = "Fernandes, {Eraldo R.} and Milidi{\'u}, {Ruy L.}",

year = "2012",

doi = "10.1007/978-3-642-28885-2_17",

language = "English",

isbn = "978-3-642-28884-5",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "146--156",

editor = "Helena Caseli and Aline Villavicencio and Antonio Teixeira and Fernando Perdigao",

booktitle = "Computational Processing of the Portuguese Language",

address = "Germany",

note = "International Conference on Computational Processing of Portuguese, PROPOR 2012 ; Conference date: 17-04-2012 Through 20-04-2012",

url = "https://aclweb.org/portal/content/10th-international-conference-computational-processing-portuguese-propor-2012",

}

RIS

TY - CHAP

T1 - Entropy-guided feature generation for structured learning of Portuguese dependency parsing

AU - Fernandes, Eraldo R.

AU - Milidiú, Ruy L.

N1 - Conference code: 10

PY - 2012

Y1 - 2012

N2 - Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.

AB - Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.

KW - dependency parsing

KW - entropy-guided feature generation

KW - machine learning

KW - structured learning

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=84858599304&partnerID=8YFLogxK

UR - https://d-nb.info/1019948167

U2 - 10.1007/978-3-642-28885-2_17

DO - 10.1007/978-3-642-28885-2_17

M3 - Article in conference proceedings

AN - SCOPUS:84858599304

SN - 978-3-642-28884-5

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 146

EP - 156

BT - Computational Processing of the Portuguese Language

A2 - Caseli, Helena

A2 - Villavicencio, Aline

A2 - Teixeira, Antonio

A2 - Perdigao, Fernando

PB - Springer Verlag

CY - Berlin, Heidelberg

T2 - International Conference on Computational Processing of Portuguese

Y2 - 17 April 2012 through 20 April 2012

ER -

Other publications by the same author(s)

Data practices in apps from Brazil: What do privacy policies inform us about?

Quadros dos Reis, V., Rabello, M. E. R., Lima, A. C., Jardim, G. P. S., Fernandes, E. R. & Brefeld, U., 10.02.2023, In: Journal on Interactive Systems. 14, 1, p. 1-8 8 p.

Research output: Journal contributions › Journal articles › Research › peer-review

Entity Extraction from Portuguese Legal Documents Using Distant Supervision

Navarezi, L. M., Sakiyama, K., Rodrigues, L. S., Robaldo, C. M. O., Lobato, G. R., Vilela, P. A., Matsubara, E. T. & Fernandes, E. R., 2022, Computational Processing of the Portuguese Language : 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21-23, 2022, Proceedings. Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D., Magro, C. & Pinto, H. (eds.). Cham: Springer Nature Switzerland AG, p. 166-176 11 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 13208 LNAI).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

FaST: A linear time stack trace alignment heuristic for crash report deduplication

Rodrigues, I. M., Aloise, D. & Fernandes, E. R., 17.10.2022, The 2022 Mining Software Repositories Conference: MSR 2022, Proceedings; 18-20 May 2022, Virtual; 23-24 May 2022, Pittsburgh, Pennsylvania. New York: Institute of Electrical and Electronics Engineers Inc., p. 549-560 12 p. (Proceedings - IEEE/ACM International Conference on Mining Software Repositories ).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Performance predictors for graphics processing units applied to dark-silicon-aware design space exploration

Sonohata, R., Arigoni, D. C. A., Fernandes, E. R., Ribeiro dos Santos, R. & Dessandre Duenha, L., 01.08.2023, In: Concurrency and Computation: Practice and Experience. 35, 17, 16 p., e6877.

Research output: Journal contributions › Journal articles › Research › peer-review

TraceSim: An Alignment Method for Computing Stack Trace Similarity

Rodrigues, I. M., Khvorov, A., Aloise, D., Vasiliev, R., Koznov, D., Fernandes, E. R., Chernishev, G., Luciv, D. & Povarov, N., 01.03.2022, In: Empirical Software Engineering. 27, 2, 41 p., 53.

Research output: Journal contributions › Journal articles › Research › peer-review

DOI

https://doi.org/10.1007/978-3-642-28885-2_17
Final published version

Entropy-guided feature generation for structured learning of Portuguese dependency parsing

Standard

Harvard

APA

Vancouver

Bibtex

RIS

Other publications by the same author(s)

Data practices in apps from Brazil: What do privacy policies inform us about?

Entity Extraction from Portuguese Legal Documents Using Distant Supervision

FaST: A linear time stack trace alignment heuristic for crash report deduplication

Performance predictors for graphics processing units applied to dark-silicon-aware design space exploration

TraceSim: An Alignment Method for Computing Stack Trace Similarity

DOI

Recently viewed

Researchers

Projects

Activities

Publications

Press / Media