Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition. / Thiée, Lukas Walter.
Document Analysis and Recognition – ICDAR 2024 Workshops, Proceedings. ed. / Harold Mouchère; Anna Zhu. Springer Science and Business Media Deutschland GmbH, 2024. p. 199-212 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14936).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Thiée, LW 2024, Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition. in H Mouchère & A Zhu (eds), Document Analysis and Recognition – ICDAR 2024 Workshops, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14936, Springer Science and Business Media Deutschland GmbH, pp. 199-212, International Workshops co-located with the 18th International Conference on Document Analysis and Recognition - ICDAR 2024, Athens, Greece, 30.08.24. https://doi.org/10.1007/978-3-031-70642-4_13

APA

Thiée, L. W. (2024). Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition. In H. Mouchère, & A. Zhu (Eds.), Document Analysis and Recognition – ICDAR 2024 Workshops, Proceedings (pp. 199-212). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14936). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-70642-4_13

Vancouver

Thiée LW. Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition. In Mouchère H, Zhu A, editors, Document Analysis and Recognition – ICDAR 2024 Workshops, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. p. 199-212. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-70642-4_13

Bibtex

@inbook{5394ed10207f402b80611e6379fdb362,
title = "Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition",
abstract = "Document analysis and invoice recognition have been significantly advanced in recent years by grid-based, graph-based and transformer architectures. However, it is not only the model architecture that influences an approach{\textquoteright}s results, but also the quality of training and test data. In this paper, we perform an ablation study on an existing state-of-the-art pre-trained multimodal GAT network. Therein we investigate two kinds of modifications to understand the sensitivity of the results by (1) exchanging the language module and (2) applying both the original and modified network on a perfect synthetic and an imperfect real-world dataset. The results of the study show the importance of language modules for semantic embeddings in multimodal invoice recognition and illustrate the impact of data annotation quality. We further contribute an adapted GAT model for German invoices.",
keywords = "GAT, GraphDoc, Inv3D, Invoice recognition, Synthetic data, Informatics",
author = "Thi{\'e}e, {Lukas Walter}",
note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.; International Workshops co-located with the 18th International Conference on Document Analysis and Recognition - ICDAR 2024, IDCA 2024 ; Conference date: 30-08-2024 Through 31-08-2024",
year = "2024",
month = sep,
day = "11",
doi = "10.1007/978-3-031-70642-4_13",
language = "English",
isbn = "978-3-031-70641-7",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "199--212",
editor = "Harold Mouch{\`e}re and Anna Zhu",
booktitle = "Document Analysis and Recognition – ICDAR 2024 Workshops, Proceedings",
address = "Germany",
url = "https://icdar2024.net/",

}

RIS

TY - CHAP

T1 - Ablation Study of a Multimodal Gat Network on Perfect Synthetic and Real-world Data to Investigate the Influence of Language Models in Invoice Recognition

AU - Thiée, Lukas Walter

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

PY - 2024/9/11

Y1 - 2024/9/11

N2 - Document analysis and invoice recognition have been significantly advanced in recent years by grid-based, graph-based and transformer architectures. However, it is not only the model architecture that influences an approach’s results, but also the quality of training and test data. In this paper, we perform an ablation study on an existing state-of-the-art pre-trained multimodal GAT network. Therein we investigate two kinds of modifications to understand the sensitivity of the results by (1) exchanging the language module and (2) applying both the original and modified network on a perfect synthetic and an imperfect real-world dataset. The results of the study show the importance of language modules for semantic embeddings in multimodal invoice recognition and illustrate the impact of data annotation quality. We further contribute an adapted GAT model for German invoices.

AB - Document analysis and invoice recognition have been significantly advanced in recent years by grid-based, graph-based and transformer architectures. However, it is not only the model architecture that influences an approach’s results, but also the quality of training and test data. In this paper, we perform an ablation study on an existing state-of-the-art pre-trained multimodal GAT network. Therein we investigate two kinds of modifications to understand the sensitivity of the results by (1) exchanging the language module and (2) applying both the original and modified network on a perfect synthetic and an imperfect real-world dataset. The results of the study show the importance of language modules for semantic embeddings in multimodal invoice recognition and illustrate the impact of data annotation quality. We further contribute an adapted GAT model for German invoices.

KW - GAT

KW - GraphDoc

KW - Inv3D

KW - Invoice recognition

KW - Synthetic data

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=85204886386&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-70642-4_13

DO - 10.1007/978-3-031-70642-4_13

M3 - Article in conference proceedings

AN - SCOPUS:85204886386

SN - 978-3-031-70641-7

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 199

EP - 212

BT - Document Analysis and Recognition – ICDAR 2024 Workshops, Proceedings

A2 - Mouchère, Harold

A2 - Zhu, Anna

PB - Springer Science and Business Media Deutschland GmbH

T2 - International Workshops co-located with the 18th International Conference on Document Analysis and Recognition - ICDAR 2024

Y2 - 30 August 2024 through 31 August 2024

ER -