Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Standard

Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. / Krieger, Felix; Drews, Paul; Funk, Burkhardt et al.
Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Hrsg. / Frederik Ahlemann; Reinhard Schütte; Stefan Stieglitz. Cham: Springer Science and Business Media Deutschland GmbH, 2021. S. 5-20 (Lecture Notes in Information Systems and Organisation; Band 47).

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Harvard

Krieger, F, Drews, P, Funk, B & Wobbe, T 2021, Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. in F Ahlemann, R Schütte & S Stieglitz (Hrsg.), Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Lecture Notes in Information Systems and Organisation, Bd. 47, Springer Science and Business Media Deutschland GmbH, Cham, S. 5-20. https://doi.org/10.1007/978-3-030-86797-3_1

APA

Krieger, F., Drews, P., Funk, B., & Wobbe, T. (2021). Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. In F. Ahlemann, R. Schütte, & S. Stieglitz (Hrsg.), Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues (S. 5-20). (Lecture Notes in Information Systems and Organisation; Band 47). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-86797-3_1

Vancouver

Krieger F, Drews P, Funk B, Wobbe T. Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. in Ahlemann F, Schütte R, Stieglitz S, Hrsg., Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Cham: Springer Science and Business Media Deutschland GmbH. 2021. S. 5-20. (Lecture Notes in Information Systems and Organisation). doi: 10.1007/978-3-030-86797-3_1

Bibtex

@inbook{98dc91fafcdd4e41b2b7054f2b42c72d,
title = "Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety",
abstract = "Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753. ",
keywords = "Audit digitization, Graph attention networks, Graph-based machine learning, Unstructured data, Informatics, Business informatics",
author = "Felix Krieger and Paul Drews and Burkhardt Funk and Till Wobbe",
note = "Publisher Copyright: {\textcopyright} 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.",
year = "2021",
doi = "10.1007/978-3-030-86797-3_1",
language = "English",
isbn = "978-3-030-86796-6",
series = "Lecture Notes in Information Systems and Organisation",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "5--20",
editor = "Frederik Ahlemann and Reinhard Sch{\"u}tte and Stefan Stieglitz",
booktitle = "Innovation Through Information Systems - Volume II",
address = "Germany",

}

RIS

TY - CHAP

T1 - Information Extraction from Invoices

T2 - A Graph Neural Network Approach for Datasets with High Layout Variety

AU - Krieger, Felix

AU - Drews, Paul

AU - Funk, Burkhardt

AU - Wobbe, Till

N1 - Publisher Copyright: © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

PY - 2021

Y1 - 2021

N2 - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

AB - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

KW - Audit digitization

KW - Graph attention networks

KW - Graph-based machine learning

KW - Unstructured data

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85118159934&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/02063821-c1f8-3ef1-a187-37439e188ef4/

U2 - 10.1007/978-3-030-86797-3_1

DO - 10.1007/978-3-030-86797-3_1

M3 - Article in conference proceedings

SN - 978-3-030-86796-6

T3 - Lecture Notes in Information Systems and Organisation

SP - 5

EP - 20

BT - Innovation Through Information Systems - Volume II

A2 - Ahlemann, Frederik

A2 - Schütte, Reinhard

A2 - Stieglitz, Stefan

PB - Springer Science and Business Media Deutschland GmbH

CY - Cham

ER -

Links

DOI