Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Standard
Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Hrsg. / Frederik Ahlemann; Reinhard Schütte; Stefan Stieglitz. Cham: Springer Science and Business Media Deutschland GmbH, 2021. S. 5-20 (Lecture Notes in Information Systems and Organisation; Band 47).
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Information Extraction from Invoices
T2 - A Graph Neural Network Approach for Datasets with High Layout Variety
AU - Krieger, Felix
AU - Drews, Paul
AU - Funk, Burkhardt
AU - Wobbe, Till
N1 - Publisher Copyright: © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.
AB - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.
KW - Audit digitization
KW - Graph attention networks
KW - Graph-based machine learning
KW - Unstructured data
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=85118159934&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/02063821-c1f8-3ef1-a187-37439e188ef4/
U2 - 10.1007/978-3-030-86797-3_1
DO - 10.1007/978-3-030-86797-3_1
M3 - Article in conference proceedings
SN - 978-3-030-86796-6
T3 - Lecture Notes in Information Systems and Organisation
SP - 5
EP - 20
BT - Innovation Through Information Systems - Volume II
A2 - Ahlemann, Frederik
A2 - Schütte, Reinhard
A2 - Stieglitz, Stefan
PB - Springer Science and Business Media Deutschland GmbH
CY - Cham
ER -