Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Document analysis and invoice recognition have been significantly advanced in recent years by grid-based, graph-based and transformer architectures. However, it is not only the model architecture that influences an approach’s results, but also the quality of training and test data. In this paper, we perform an ablation study on an existing state-of-the-art pre-trained multimodal GAT network. Therein we investigate two kinds of modifications to understand the sensitivity of the results by (1) exchanging the language module and (2) applying both the original and modified network on a perfect synthetic and an imperfect real-world dataset. The results of the study show the importance of language modules for semantic embeddings in multimodal invoice recognition and illustrate the impact of data annotation quality. We further contribute an adapted GAT model for German invoices.

Original languageEnglish
Title of host publicationDocument Analysis and Recognition – ICDAR 2024 Workshops : Athens, Greece, August 30–31, 2024 Proceedings, Part II
EditorsHarold Mouchère, Anna Zhu
Number of pages14
Volume2
Place of PublicationCham
PublisherSpringer Nature
Publication date11.09.2024
Pages199-212
ISBN (print)978-3-031-70641-7
ISBN (electronic)978-3-031-70642-4
DOIs
Publication statusPublished - 11.09.2024
EventInternational Workshops co-located with the 18th International Conference on Document Analysis and Recognition - ICDAR 2024 - Athens, Greece
Duration: 30.08.202431.08.2024
https://icdar2024.net/

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

    Research areas

  • GAT, GraphDoc, Inv3D, Invoice recognition, Synthetic data
  • Informatics

Recently viewed

Publications

  1. Examining how AI capabilities can foster organizational performance in public organizations
  2. Timing matters: Distinct effects of nitrogen and phosphorus fertilizer application timing on root system architecture responses
  3. Perfectly nested or significantly nested - an important difference for conservation management
  4. Proof of concept
  5. Developing spatial biophysical accounting for multiple ecosystem services
  6. Obtaining Object Information from Stereo Vision System for Autonomous Vehicles
  7. A Unified Contextual Bandit Framework for Long- and Short-Term Recommendations
  8. Dynamic Inversion-Enhanced U-Control of Quadrotor Trajectory Tracking
  9. Determinants and Outcomes of Dual Distribution:
  10. A cascade regulator using Lyapunov's PID-PID controllers for an aggregate actuator in automotive applications
  11. Internal forces in robotic manipulation and in general mechanisms using a geometric approach
  12. Simon Denny
  13. Geometric structures using model predictive control for an electromagnetic actuator
  14. Implementation of Chemometric Tools to Improve Data Mining and Prioritization in LC-HRMS for Nontarget Screening of Organic Micropollutants in Complex Water Matrixes
  15. The Influence of Robots’ Emotion Expressions on the Uncanny-Valley-Effect
  16. Perception of Space and Time in a Created Environment
  17. New descriptions and typifications of syntaxa within the project 'Plant communities of Mecklenburg-Vorpommern and their vulnerability' - Part II
  18. Assessing authenticity in modelling test items: deriving a theoretical model
  19. Playing in the Spaces: Anarchism in the Classroom
  20. Solution for the direct kinematics problem of the general stewart-gough platform by using only linear actuators’ orientations
  21. Study of the solidification of AS alloys combining in situ synchrotron diffraction and differential scanning calorimetry
  22. Usage pattern-based exposure screening as a simple tool for the regional priority-setting in environmental risk assessment of veterinary antibiotics
  23. Multi-view hidden markov perceptrons
  24. Joint Proceedings of Scholarly QALD 2023 and SemREC 2023 co-located with 22nd International Semantic Web Conference ISWC 2023
  25. Diffusion of the Balanced Scorecard
  26. What role for frames in scalar conflicts?
  27. Improving efficiency in budgeting
  28. Determinants of entrepreneurial intent: A meta-analytic test and integration of competing models
  29. Early subtropical forest growth is driven by community mean trait values and functional diversity rather than the abiotic environment
  30. Linked Accomplishment Of Order Management And Production Planning And Control. An Integrated Model-based Approach