A machine learning approach to Portuguese clause identification

Eraldo R. Fernandes; Cícero N. Dos Santos; Ruy L. Milidiú

doi:10.1007/978-3-642-12320-7_8

A machine learning approach to Portuguese clause identification

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

Standard

A machine learning approach to Portuguese clause identification. / Fernandes, Eraldo R.; Dos Santos, Cícero N.; Milidiú, Ruy L.
Computational Processing of the Portuguese Language: 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings. Hrsg. / Thiago Alexandre Salgueiro Pardo; Antonio Branco; Aldebaro Klautau; Renata Viera; Vera Lucia Strube de Lima. Berlin, Heidelberg: Springer Verlag, 2010. S. 55-64 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 6001 LNAI).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

Harvard

Fernandes, ER, Dos Santos, CN & Milidiú, RL 2010, A machine learning approach to Portuguese clause identification. in TAS Pardo, A Branco, A Klautau, R Viera & VLS de Lima (Hrsg.), Computational Processing of the Portuguese Language: 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 6001 LNAI, Springer Verlag, Berlin, Heidelberg, S. 55-64, International Conference on Computational Processing of the Portuguese Language, Porto Alegre, Brasilien, 27.04.10. https://doi.org/10.1007/978-3-642-12320-7_8

APA

Fernandes, E. R., Dos Santos, C. N., & Milidiú, R. L. (2010). A machine learning approach to Portuguese clause identification. In T. A. S. Pardo, A. Branco, A. Klautau, R. Viera, & V. L. S. de Lima (Hrsg.), Computational Processing of the Portuguese Language: 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings (S. 55-64). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 6001 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-642-12320-7_8

Vancouver

Fernandes ER, Dos Santos CN, Milidiú RL. A machine learning approach to Portuguese clause identification. in Pardo TAS, Branco A, Klautau A, Viera R, de Lima VLS, Hrsg., Computational Processing of the Portuguese Language: 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings. Berlin, Heidelberg: Springer Verlag. 2010. S. 55-64. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-12320-7_8

Bibtex

@inbook{b537e9fa289145338170566211ee392f,

title = "A machine learning approach to Portuguese clause identification",

abstract = "In this work, we apply and evaluate a machine-learningbased system to Portuguese clause identification. To the best of our knowledge, this is the first machine-learning-based approach to this task. The proposed system is based on Entropy Guided Transformation Learning. In order to train and evaluate the proposed system, we derive a clause annotated corpus from the Bosque corpus of the Floresta Sin{\'t}a(c)tica Project - an European and Brazilian Portuguese treebank. We include part-of-speech (POS) tags to the derived corpus by using an automatic state-of-the-art tagger. Additionally, we use a simple heuristic to derive a phrase-chunk-like (PCL) feature from phrases in the Bosque corpus. We train an extractor to this sub-task and use it to automatically include the PCL feature in the derived clause corpus. We use POS and PCL tags as input features in the proposed clause identifier. This system achieves a Fβ=1 of 73.90, when using the golden values of the PCL feature. When the automatic values are used, the system obtains Fβ=1 = 69.31. These are promising results for a first machine learning approach to Portuguese clause identification. Moreover, these results are achieved using a very simple PCL feature, which is generated by a PCL extractor developed with very little modeling effort.",

keywords = "Informatics, Machine Learn Approach, Training Corpus, Shared Task, Human Language Technology, Corpus Format, Business informatics",

author = "Fernandes, {Eraldo R.} and {Dos Santos}, {C{\'i}cero N.} and Milidi{\'u}, {Ruy L.}",

year = "2010",

doi = "10.1007/978-3-642-12320-7_8",

language = "English",

isbn = "3-642-12319-8",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "55--64",

editor = "Pardo, {Thiago Alexandre Salgueiro} and Antonio Branco and Aldebaro Klautau and Renata Viera and {de Lima}, {Vera Lucia Strube}",

booktitle = "Computational Processing of the Portuguese Language",

address = "Germany",

note = "International Conference on Computational Processing of the Portuguese Language, PROPOR 2010 ; Conference date: 27-04-2010 Through 30-04-2010",

url = "https://www.inf.pucrs.br/~propor2010/",

}

RIS

TY - CHAP

T1 - A machine learning approach to Portuguese clause identification

AU - Fernandes, Eraldo R.

AU - Dos Santos, Cícero N.

AU - Milidiú, Ruy L.

N1 - Conference code: 9

PY - 2010

Y1 - 2010

N2 - In this work, we apply and evaluate a machine-learningbased system to Portuguese clause identification. To the best of our knowledge, this is the first machine-learning-based approach to this task. The proposed system is based on Entropy Guided Transformation Learning. In order to train and evaluate the proposed system, we derive a clause annotated corpus from the Bosque corpus of the Floresta Sint́a(c)tica Project - an European and Brazilian Portuguese treebank. We include part-of-speech (POS) tags to the derived corpus by using an automatic state-of-the-art tagger. Additionally, we use a simple heuristic to derive a phrase-chunk-like (PCL) feature from phrases in the Bosque corpus. We train an extractor to this sub-task and use it to automatically include the PCL feature in the derived clause corpus. We use POS and PCL tags as input features in the proposed clause identifier. This system achieves a Fβ=1 of 73.90, when using the golden values of the PCL feature. When the automatic values are used, the system obtains Fβ=1 = 69.31. These are promising results for a first machine learning approach to Portuguese clause identification. Moreover, these results are achieved using a very simple PCL feature, which is generated by a PCL extractor developed with very little modeling effort.

AB - In this work, we apply and evaluate a machine-learningbased system to Portuguese clause identification. To the best of our knowledge, this is the first machine-learning-based approach to this task. The proposed system is based on Entropy Guided Transformation Learning. In order to train and evaluate the proposed system, we derive a clause annotated corpus from the Bosque corpus of the Floresta Sint́a(c)tica Project - an European and Brazilian Portuguese treebank. We include part-of-speech (POS) tags to the derived corpus by using an automatic state-of-the-art tagger. Additionally, we use a simple heuristic to derive a phrase-chunk-like (PCL) feature from phrases in the Bosque corpus. We train an extractor to this sub-task and use it to automatically include the PCL feature in the derived clause corpus. We use POS and PCL tags as input features in the proposed clause identifier. This system achieves a Fβ=1 of 73.90, when using the golden values of the PCL feature. When the automatic values are used, the system obtains Fβ=1 = 69.31. These are promising results for a first machine learning approach to Portuguese clause identification. Moreover, these results are achieved using a very simple PCL feature, which is generated by a PCL extractor developed with very little modeling effort.

KW - Informatics

KW - Machine Learn Approach

KW - Training Corpus

KW - Shared Task

KW - Human Language Technology

KW - Corpus Format

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=78650284958&partnerID=8YFLogxK

UR - https://d-nb.info/1000569276

U2 - 10.1007/978-3-642-12320-7_8

DO - 10.1007/978-3-642-12320-7_8

M3 - Article in conference proceedings

AN - SCOPUS:78650284958

SN - 3-642-12319-8

SN - 978-3-642-12319-1

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 55

EP - 64

BT - Computational Processing of the Portuguese Language

A2 - Pardo, Thiago Alexandre Salgueiro

A2 - Branco, Antonio

A2 - Klautau, Aldebaro

A2 - Viera, Renata

A2 - de Lima, Vera Lucia Strube

PB - Springer Verlag

CY - Berlin, Heidelberg

T2 - International Conference on Computational Processing of the Portuguese Language

Y2 - 27 April 2010 through 30 April 2010

ER -

Weitere Publikationen dieser Person(en)

Data practices in apps from Brazil: What do privacy policies inform us about?

Quadros dos Reis, V., Rabello, M. E. R., Lima, A. C., Jardim, G. P. S., Fernandes, E. R. & Brefeld, U., 10.02.2023, in: Journal on Interactive Systems. 14, 1, S. 1-8 8 S.

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

Entity Extraction from Portuguese Legal Documents Using Distant Supervision

Navarezi, L. M., Sakiyama, K., Rodrigues, L. S., Robaldo, C. M. O., Lobato, G. R., Vilela, P. A., Matsubara, E. T. & Fernandes, E. R., 2022, Computational Processing of the Portuguese Language : 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21-23, 2022, Proceedings. Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D., Magro, C. & Pinto, H. (Hrsg.). Cham: Springer Nature Switzerland AG, S. 166-176 11 S. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13208 LNAI).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

FaST: A linear time stack trace alignment heuristic for crash report deduplication

Rodrigues, I. M., Aloise, D. & Fernandes, E. R., 17.10.2022, The 2022 Mining Software Repositories Conference: MSR 2022, Proceedings; 18-20 May 2022, Virtual; 23-24 May 2022, Pittsburgh, Pennsylvania. New York: Institute of Electrical and Electronics Engineers Inc., S. 549-560 12 S. (Proceedings - IEEE/ACM International Conference on Mining Software Repositories ).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

Performance predictors for graphics processing units applied to dark-silicon-aware design space exploration

Sonohata, R., Arigoni, D. C. A., Fernandes, E. R., Ribeiro dos Santos, R. & Dessandre Duenha, L., 01.08.2023, in: Concurrency and Computation: Practice and Experience. 35, 17, 16 S., e6877.

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

TraceSim: An Alignment Method for Computing Stack Trace Similarity

Rodrigues, I. M., Khvorov, A., Aloise, D., Vasiliev, R., Koznov, D., Fernandes, E. R., Chernishev, G., Luciv, D. & Povarov, N., 01.03.2022, in: Empirical Software Engineering. 27, 2, 41 S., 53.

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

DOI

https://doi.org/10.1007/978-3-642-12320-7_8
Endgültige, publizierte Fassung