Entity Extraction from Portuguese Legal Documents Using Distant Supervision

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Lucas M. Navarezi
  • Kenzo Sakiyama
  • Lucas S. Rodrigues
  • Caio M.O. Robaldo
  • Gustavo R. Lobato
  • Paulo A. Vilela
  • Edson T. Matsubara
  • Eraldo R. Fernandes

Most approaches to role-filler entity extraction (REE) rely on large labeled training corpora in which entity mentions are directly annotated in the input document. In this work, we leverage an existing knowledge base (KB) of entities to perform document-level REE from drug seizure petitions. We propose a system that learns to extract entities from petitions to fill 29 roles of a drug seizure event. Although we have access to a KB covering more than 170 thousand entities and six thousand petitions, such that each entity in the KB is linked to a specific petition, the mentions to an entity within a petition’s text are not annotated. The lack of these annotations brings challenges related to mismatches between entity values in the KB and entity mentions in the documents. Additionally, there are entities with same type or same value. Thus, we propose a distant annotation method to overcome these challenges and automatically label petition documents using the available KB. This annotation method includes a parameter that controls the balance between precision and recall. We also propose a strategy to effectively tune this parameter in order to optimize a given metric. We then train a BERT-based sequence labeling model that learns to identify entity mentions and label them. Our system achieves an F1 score of 78.59 with precision over 82%. We also report ablation studies regarding the distant annotation method.

Original languageEnglish
Title of host publicationComputational Processing of the Portuguese Language : 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21-23, 2022, Proceedings
EditorsVládia Pinheiro, Pablo Gamallo, Raquel Amaro, Carolina Scarton, Fernando Batista, Diego Silva, Catarina Magro, Hugo Pinto
Number of pages11
Place of PublicationCham
PublisherSpringer Nature Switzerland AG
Publication date2022
Pages166-176
ISBN (print)978-3-030-98304-8
ISBN (electronic)978-3-030-98305-5
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event15th International Conference on the Computational Processing of Portuguese - PROPOR 2022 - University of Fortaleza / hybrid, Fortaleza, Brazil
Duration: 21.03.202223.03.2022
https://www.aclweb.org/portal/content/propor-2022-15th-international-conference-computational-processing-portuguese

Recently viewed

Publications

  1. Thomas Hoffmann: TERRA. Globale Herausforderungen1: Die Zukunft, die wir wollen.
  2. Vasodilatierende Substanzen in Kläranlagenabläufen und Oberflächengewässern
  3. Nährelementaustrag aus Heideökosystemen durch verschiedene Pflegeverfahren
  4. Zeit- und Einkommensarmut von Selbständigen als Freiberufler und Unternehmer
  5. Heinrich Popitz: Einführung in die Soziologie. Konstanz: University Press 2010
  6. Synergistic effects of non-Apis bees and honey bees for pollination services
  7. Annäherung der sozialen Schichten im Bildungswachstum der letzten 200 Jahre
  8. La teoría crítica duele cuando toca el nervio. Entrevista a Christoph Türcke
  9. Bildungsorganisationale Konsumkultur als Kontext jugendlichen Konsumlernens
  10. Microbiological and environmental effects of aquifer thermal energy storage
  11. Newest developments on the manufacture of helical profiles by hot extrusion
  12. Communication under the microscope: The theory and practice of microanalysis
  13. Regional differences in soil pH niche among dry grassland plants in Eurasia
  14. Pestizidrückstände in ländlichen und urbanen Grund- und Oberflächengewässern
  15. Interlanguage pragmatics: From use to acquisition to second language pedagogy
  16. Institutional ownership and firm performance in the global shipping industry
  17. Entwicklung eines Regelungskonzepts für ein Ressourcenschutzrecht des Bundes
  18. Visiting the Colección Poyón, or Indigeneity and the Nation-State in Guatemala
  19. Irreversibility, ignorance, and the intergenerational equity-efficieny trade-off
  20. Potentiale und Herausforderungen einer empirischen Subjektivierungsforschung
  21. Bioconversion of renewable feedstocks and agri-food residues into lactic acid
  22. Conserving the World's Finest Grassland Amidst Ambitious National Development
  23. Deposition, Verteilung sowie Bedeutung für den Menschen und sein Nahrungsnetz
  24. On the economics of electrical storage for variable renewable energy sources
  25. Book Review: Political Entrepreneurs: The Rise of Challenger Parties in Europe
  26. Digital Gazelles: Challenges of Digital Startups during Phases of High Growth
  27. Governance of Labor Standards in Australian and German Garment Supply Chains
  28. »Die dampfenden Hälse der Pferde im Turm von Babel« und »Märchen auf Bestellung«
  29. Arkitektur og politik: Atmosfæriske refleksioner ud fra Böhme og Sloterdijk