A Study on the Impact of Intradomain Finetuning of Deep Language Models for Legal Named Entity Recognition in Portuguese

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Deep language models, like ELMo, BERT and GPT, have achieved impressive results on several natural language tasks. These models are pretrained on large corpora of unlabeled general domain text and later supervisedly trained on downstream tasks. An optional step consists of finetuning the language model on a large intradomain corpus of unlabeled text, before training it on the final task. This aspect is not well explored in the current literature. In this work, we investigate the impact of this step on named entity recognition (NER) for Portuguese legal documents. We explore different scenarios considering two deep language architectures (ELMo and BERT), four unlabeled corpora and three legal NER tasks for the Portuguese language. Experimental findings show a significant improvement on performance due to language model finetuning on intradomain text. We also evaluate the finetuned models on two general-domain NER tasks, in order to understand whether the aforementioned improvements were really due to domain similarity or simply due to more training data. The achieved results also indicate that finetuning on a legal domain corpus hurts performance on the general-domain NER tasks. Additionally, our BERT model, finetuned on a legal corpus, significantly improves on the state-of-the-art performance on the LeNER-Br corpus, a Portuguese language NER corpus for the legal domain.

OriginalspracheEnglisch
TitelIntelligent Systems : 9th Brazilian Conference, BRACIS 2020, Rio Grande, Brazil, October 20–23, 2020, Proceedings, Part I
HerausgeberRicardo Cerri, Ronaldo C. Prati
Anzahl der Seiten15
ErscheinungsortCham
VerlagSpringer Nature Switzerland AG
Erscheinungsdatum2020
Seiten648-662
ISBN (Print)978-3-030-61376-1
ISBN (elektronisch)978-3-030-61377-8
DOIs
PublikationsstatusErschienen - 2020
Extern publiziertJa
VeranstaltungBrazilian Conference on Intelligent Systems - BRACIS 2020 - Rio Grande, Brasilien
Dauer: 20.10.202023.10.2020
Konferenznummer: 9
http://www2.sbc.org.br/bracis2020/#:~:text=The%209th%20Brazilian%20Conference%20on,%2C%2020%20to%2023%2C%202020.

DOI

Zuletzt angesehen

Forschende

  1. Peter Leonhard

Publikationen

  1. Implications of Material Flow Cost Accounting for Life Cycle Engineering
  2. Germination performance of native and non-native Ulmus pumila populations
  3. MindMatters
  4. Overyielding in experimental grassland communities - Irrespective of species pool or spatial scale
  5. An InfoSpace Paradigm for Local and ad hoc Peer-to-Peer Communication
  6. Tree cover mediates the effect on rapeseed leaf damage of excluding predatory arthropods, but in an unexpected way
  7. Exploring Affective Human-Robot Interaction with Movie Scenes
  8. Interplay of formative assessment and instructional quality—interactive effects on students’ mathematics achievement
  9. Using Large N Longitudinal Comparison to Explain Political Recruitment in Changing Democracies
  10. Where are we with? A dialectical theory on innovation
  11. Control of Permanent Magnet Synchronous Motors for Track Applications
  12. Integrating methods for ecosystem service assessment
  13. Søren Kierkegaard in deutscher Sprache
  14. Can management-sponsored non-binding remuneration votes shape the executive compensation structure?
  15. Enforcement concepts and strategies in the EU
  16. Diffusion of the Balanced Scorecard
  17. Perceptions of Organizational Downsizing
  18. Proceedings SMC 2016
  19. A new and benign hegemon on the horizon?
  20. Formative Assessment in Mathematics Instruction
  21. Use of lignins from sugarcane bagasse for assembling microparticles loaded with Azadirachta indica extracts for use as neem-based organic insecticides
  22. What is normal?
  23. Small-scale soil patterns drive sharp boundaries between succulent "dwarf" biomes (or habitats) in the arid Succulent Karoo, South Africa
  24. ETL ensembles for chunking, NER and SRL
  25. Ensuring tests of conservation interventions build on existing literature
  26. Preference and willingness to pay for meat substitutes based on micro-algae
  27. What has gone wrong with application development? Who is the culprit?
  28. Design It!
  29. Can isometric testing substitute for the one repetition maximum squat test?

Presse / Medien

  1. Die da oben?