Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
Document Analysis and Recognition – ICDAR 2024 Workshops: Athens, Greece, August 30–31, 2024 Proceedings, Part II. ed. / Harold Mouchère; Anna Zhu. Vol. 2 Cham: Springer Nature AG, 2024. p. 199-212 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14936), (Document analysis and recognition - ICDAR 2024 workshops ; Vol. 2).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition
AU - Thiée, Lukas Walter
N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024/9/11
Y1 - 2024/9/11
N2 - Document analysis and invoice recognition have been significantly advanced in recent years by grid-based, graph-based and transformer architectures. However, it is not only the model architecture that influences an approach’s results, but also the quality of training and test data. In this paper, we perform an ablation study on an existing state-of-the-art pre-trained multimodal GAT network. Therein we investigate two kinds of modifications to understand the sensitivity of the results by (1) exchanging the language module and (2) applying both the original and modified network on a perfect synthetic and an imperfect real-world dataset. The results of the study show the importance of language modules for semantic embeddings in multimodal invoice recognition and illustrate the impact of data annotation quality. We further contribute an adapted GAT model for German invoices.
AB - Document analysis and invoice recognition have been significantly advanced in recent years by grid-based, graph-based and transformer architectures. However, it is not only the model architecture that influences an approach’s results, but also the quality of training and test data. In this paper, we perform an ablation study on an existing state-of-the-art pre-trained multimodal GAT network. Therein we investigate two kinds of modifications to understand the sensitivity of the results by (1) exchanging the language module and (2) applying both the original and modified network on a perfect synthetic and an imperfect real-world dataset. The results of the study show the importance of language modules for semantic embeddings in multimodal invoice recognition and illustrate the impact of data annotation quality. We further contribute an adapted GAT model for German invoices.
KW - GAT
KW - GraphDoc
KW - Inv3D
KW - Invoice recognition
KW - Synthetic data
KW - Informatics
UR - http://www.scopus.com/inward/record.url?scp=85204886386&partnerID=8YFLogxK
UR - https://d-nb.info/1341727882
U2 - 10.1007/978-3-031-70642-4_13
DO - 10.1007/978-3-031-70642-4_13
M3 - Article in conference proceedings
AN - SCOPUS:85204886386
SN - 978-3-031-70641-7
VL - 2
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 199
EP - 212
BT - Document Analysis and Recognition – ICDAR 2024 Workshops
A2 - Mouchère, Harold
A2 - Zhu, Anna
PB - Springer Nature AG
CY - Cham
T2 - International Workshops co-located with the 18th International Conference on Document Analysis and Recognition - ICDAR 2024
Y2 - 30 August 2024 through 31 August 2024
ER -