Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Document analysis and invoice recognition have been significantly advanced in recent years by grid-based, graph-based and transformer architectures. However, it is not only the model architecture that influences an approach’s results, but also the quality of training and test data. In this paper, we perform an ablation study on an existing state-of-the-art pre-trained multimodal GAT network. Therein we investigate two kinds of modifications to understand the sensitivity of the results by (1) exchanging the language module and (2) applying both the original and modified network on a perfect synthetic and an imperfect real-world dataset. The results of the study show the importance of language modules for semantic embeddings in multimodal invoice recognition and illustrate the impact of data annotation quality. We further contribute an adapted GAT model for German invoices.

OriginalspracheEnglisch
TitelDocument Analysis and Recognition – ICDAR 2024 Workshops, Proceedings
HerausgeberHarold Mouchère, Anna Zhu
Anzahl der Seiten14
VerlagSpringer Science and Business Media Deutschland GmbH
Erscheinungsdatum11.09.2024
Seiten199-212
ISBN (Print)978-3-031-70641-7
ISBN (elektronisch)978-3-031-70642-4
DOIs
PublikationsstatusErschienen - 11.09.2024
VeranstaltungInternational Workshops co-located with the 18th International Conference on Document Analysis and Recognition - ICDAR 2024 - Athens, Griechenland
Dauer: 30.08.202431.08.2024
https://icdar2024.net/

Bibliographische Notiz

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

DOI