Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. / Krieger, Felix; Drews, Paul; Funk, Burkhardt et al.
Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. ed. / Frederik Ahlemann; Reinhard Schütte; Stefan Stieglitz. Cham: Springer Science and Business Media Deutschland, 2021. p. 5-20 (Lecture Notes in Information Systems and Organisation; Vol. 47).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Krieger, F, Drews, P, Funk, B & Wobbe, T 2021, Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. in F Ahlemann, R Schütte & S Stieglitz (eds), Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Lecture Notes in Information Systems and Organisation, vol. 47, Springer Science and Business Media Deutschland, Cham, pp. 5-20. https://doi.org/10.1007/978-3-030-86797-3_1

APA

Krieger, F., Drews, P., Funk, B., & Wobbe, T. (2021). Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. In F. Ahlemann, R. Schütte, & S. Stieglitz (Eds.), Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues (pp. 5-20). (Lecture Notes in Information Systems and Organisation; Vol. 47). Springer Science and Business Media Deutschland. https://doi.org/10.1007/978-3-030-86797-3_1

Vancouver

Krieger F, Drews P, Funk B, Wobbe T. Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. In Ahlemann F, Schütte R, Stieglitz S, editors, Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Cham: Springer Science and Business Media Deutschland. 2021. p. 5-20. (Lecture Notes in Information Systems and Organisation). doi: 10.1007/978-3-030-86797-3_1

Bibtex

@inbook{98dc91fafcdd4e41b2b7054f2b42c72d,
title = "Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety",
abstract = "Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753. ",
keywords = "Audit digitization, Graph attention networks, Graph-based machine learning, Unstructured data, Informatics, Business informatics",
author = "Felix Krieger and Paul Drews and Burkhardt Funk and Till Wobbe",
note = "Publisher Copyright: {\textcopyright} 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.",
year = "2021",
doi = "10.1007/978-3-030-86797-3_1",
language = "English",
isbn = "978-3-030-86796-6",
series = "Lecture Notes in Information Systems and Organisation",
publisher = "Springer Science and Business Media Deutschland",
pages = "5--20",
editor = "Frederik Ahlemann and Reinhard Sch{\"u}tte and Stefan Stieglitz",
booktitle = "Innovation Through Information Systems - Volume II",
address = "Germany",

}

RIS

TY - CHAP

T1 - Information Extraction from Invoices

T2 - A Graph Neural Network Approach for Datasets with High Layout Variety

AU - Krieger, Felix

AU - Drews, Paul

AU - Funk, Burkhardt

AU - Wobbe, Till

N1 - Publisher Copyright: © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

PY - 2021

Y1 - 2021

N2 - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

AB - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

KW - Audit digitization

KW - Graph attention networks

KW - Graph-based machine learning

KW - Unstructured data

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85118159934&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/02063821-c1f8-3ef1-a187-37439e188ef4/

U2 - 10.1007/978-3-030-86797-3_1

DO - 10.1007/978-3-030-86797-3_1

M3 - Article in conference proceedings

SN - 978-3-030-86796-6

T3 - Lecture Notes in Information Systems and Organisation

SP - 5

EP - 20

BT - Innovation Through Information Systems - Volume II

A2 - Ahlemann, Frederik

A2 - Schütte, Reinhard

A2 - Stieglitz, Stefan

PB - Springer Science and Business Media Deutschland

CY - Cham

ER -

Recently viewed

Publications

  1. Der Zentrale Runde Tisch der DDR: Wortprotokoll und Dokumente
  2. Rechtschreiben
  3. §50 Windenergie auf See
  4. The Water Framework Directive and Agricultural Nitrate Pollution: Will Great Expectations in Brussels be Dashed in Lower Saxony?
  5. Drivers of productivity and its temporal stability in a tropical tree diversity experiment
  6. Qu'est-ce que la "stakeholder value"?
  7. Improving the surface quality of AlMgSi1 alloy with the selection of the appropriate vibration grinding stones
  8. Successful Alien Plant Species Exhibit Functional Dissimilarity From Natives Under Varied Climatic Conditions but Not Under Increased Nutrient Availability
  9. How Participatory Should Environmental Governance Be?
  10. The changing relationship between unemployment and total fertility
  11. Joseph Heller
  12. When do customers engage with a company?
  13. Resilience in ecology
  14. (S)training experiences
  15. Widerfahrnisse
  16. Developing and Evaluating Entrepreneurship Curricula
  17. Günstigkeitsprinzip
  18. Environmental Fate and Exposure Modeling of Nanomaterials
  19. The link between audit committees, corporate governance quality and firm performance
  20. Gemeinsam für das Klima
  21. Rolling bones
  22. Tree Species Traits but Not Diversity Mitigate Stem Breakage in a Subtropical Forest following a Rare and Extreme Ice Storm
  23. Development and evaluation of a smartphone-based positivity training
  24. Manpower
  25. Documenting Artistic Networks
  26. Development from the Margins
  27. Introduction
  28. Der Zentrale Runde Tisch der DDR: Wortprotokoll und Dokumente
  29. Behavioural patterns of nocturnal carabid beetles determined by direct observations under red-light conditions

Press / Media

  1. App gegen Wegwerf-Mode