Entity Extraction from Portuguese Legal Documents Using Distant Supervision

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Lucas M. Navarezi
  • Kenzo Sakiyama
  • Lucas S. Rodrigues
  • Caio M.O. Robaldo
  • Gustavo R. Lobato
  • Paulo A. Vilela
  • Edson T. Matsubara
  • Eraldo R. Fernandes

Most approaches to role-filler entity extraction (REE) rely on large labeled training corpora in which entity mentions are directly annotated in the input document. In this work, we leverage an existing knowledge base (KB) of entities to perform document-level REE from drug seizure petitions. We propose a system that learns to extract entities from petitions to fill 29 roles of a drug seizure event. Although we have access to a KB covering more than 170 thousand entities and six thousand petitions, such that each entity in the KB is linked to a specific petition, the mentions to an entity within a petition’s text are not annotated. The lack of these annotations brings challenges related to mismatches between entity values in the KB and entity mentions in the documents. Additionally, there are entities with same type or same value. Thus, we propose a distant annotation method to overcome these challenges and automatically label petition documents using the available KB. This annotation method includes a parameter that controls the balance between precision and recall. We also propose a strategy to effectively tune this parameter in order to optimize a given metric. We then train a BERT-based sequence labeling model that learns to identify entity mentions and label them. Our system achieves an F1 score of 78.59 with precision over 82%. We also report ablation studies regarding the distant annotation method.

Original languageEnglish
Title of host publicationComputational Processing of the Portuguese Language : 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21-23, 2022, Proceedings
EditorsVládia Pinheiro, Pablo Gamallo, Raquel Amaro, Carolina Scarton, Fernando Batista, Diego Silva, Catarina Magro, Hugo Pinto
Number of pages11
Place of PublicationCham
PublisherSpringer Nature Switzerland AG
Publication date2022
Pages166-176
ISBN (print)978-3-030-98304-8
ISBN (electronic)978-3-030-98305-5
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event15th International Conference on the Computational Processing of Portuguese - PROPOR 2022 - University of Fortaleza / hybrid, Fortaleza, Brazil
Duration: 21.03.202223.03.2022
https://www.aclweb.org/portal/content/propor-2022-15th-international-conference-computational-processing-portuguese

Recently viewed

Publications

  1. Efficacy of gamified Applications of mental Health Promotion and Prevention
  2. Developing key competencies for sustainable development in higher education
  3. Das Zukunftszentrum Lehrerbildung - neue Wege in der Theorie-Praxis-Vernetzung
  4. Betriebsgrößenbedingte Unterschiede in der Personalarbeit von Unternehmen
  5. The Sustainability Balanced Scorecard: A Systematic Review of Architectures
  6. The impacts of social-ecological system change on human-nature connectedness
  7. Sorgfaltspflichten für Unternehmen in transnationalen Menschenrechtsfällen
  8. Peter Hay, Advanced Introduction to Private International Law and Procedure
  9. Noten und Kompetenzen in verschiedenen Fächern, Schulstufen und Schulformen
  10. Miteinander forschen zwischen Universität und Naturpark Wildeshauser Geest
  11. Meta-analysis as a tool for developing entrepreneurship research and theory
  12. Is seashell powder suitable for phosphate recovery from fermentation broth?
  13. From trade-offs to synergies in food security and biodiversity conservation
  14. Entschädigungslose Enteignung russisch kontrollierter Energieinfrastruktur
  15. Emerging Areas in Research on Higher Education for Sustainable Development
  16. Digital health literacy and subjective wellbeing in the context of COVID-19
  17. Die Interaktion von ökologischer Normorientierung und situativen Faktoren
  18. Cradle to Cradle®. From recycling building components to up-cycllng buildings.
  19. Barrier effects in real-world compared to virtual reality macro-environments
  20. AFM imaging and nanoindentation of polymer of intrinsic microporosity PIM-1
  21. Working in East German socialism in 1980 and in capitalism 15 years later
  22. User Authentication via Multifaceted Mouse Movements and Outlier Exposure
  23. The emotional spectrum in traffic situations: Results of two online-studies
  24. Sustainable institutional investors and corporate biodiversity disclosure
  25. Sozialpolitik und soziale Lage in Deutschland, Österreich und der Schweiz
  26. Organisation und Planung des Fahrzeugeinsatzes in einem Chemieunternehmen
  27. Modeling Interactions and Dependencies in Production Planning and Control
  28. Making education for sustainable development happen in elementary schools
  29. IN DER UMWELT ABBAUBARE CHINOLON-ANTIBIOTIKA MIT HEMIAMINAL-STRUKTUREINHEIT
  30. "Global competence" - der neue Fokusbereich in PISA 2018. Ein holpriger Start
  31. Diskursanalysen institutioneller Gespräche - das Beispiel von „Focus Groups“
  32. Development of Early Spatial Perspective-Taking - Toward a Three-Level Model
  33. Anomaly detection in formed sheet metals using convolutional autoencoders
  34. Absolutely continuous random power series in reciprocals of Pisot numbers
  35. A Decoupled MPC for Motion Control in Robotino Using a Geometric Approach
  36. Wirtschaftspsychologie als Studienprogramm in Deutschland und im Ausland
  37. Taking notes as a strategy for solving reality-based tasks in mathematics
  38. Studentische Perspektiven auf die Qualitätsoffensive Lehrerbildung Musik
  39. Konzept zur Neuabgrenzung und Ausweitung des Naturparks Lüneburger Heide
  40. Kompetenzen mathematisch begabter Grundschulkinder erkunden und fördern
  41. Kläranlagen als Quelle für polyfluorierte Verbindungen in der Atmosphäre
  42. Joint production, externalities, and the regulation of production networks
  43. Gut vorbereitet auf den Umgang mit sprachlicher Diversität im Unterricht?
  44. Future-proofing ecosystem restoration through enhancing adaptive capacity
  45. Endogeneity in the relation between poverty, wealth and life satisfaction
  46. Don't trust the machine - KI für das Schreiben von Beschwerdetexten nutzen
  47. Demografisch bedingte Modifikationen der betrieblichen Fachkräfteakquise
  48. Analysis of a phase‐field finite element implementation for precipitation