Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. / Krieger, Felix; Drews, Paul; Funk, Burkhardt et al.
Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. ed. / Frederik Ahlemann; Reinhard Schütte; Stefan Stieglitz. Cham: Springer Science and Business Media Deutschland, 2021. p. 5-20 (Lecture Notes in Information Systems and Organisation; Vol. 47).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Krieger, F, Drews, P, Funk, B & Wobbe, T 2021, Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. in F Ahlemann, R Schütte & S Stieglitz (eds), Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Lecture Notes in Information Systems and Organisation, vol. 47, Springer Science and Business Media Deutschland, Cham, pp. 5-20. https://doi.org/10.1007/978-3-030-86797-3_1

APA

Krieger, F., Drews, P., Funk, B., & Wobbe, T. (2021). Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. In F. Ahlemann, R. Schütte, & S. Stieglitz (Eds.), Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues (pp. 5-20). (Lecture Notes in Information Systems and Organisation; Vol. 47). Springer Science and Business Media Deutschland. https://doi.org/10.1007/978-3-030-86797-3_1

Vancouver

Krieger F, Drews P, Funk B, Wobbe T. Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. In Ahlemann F, Schütte R, Stieglitz S, editors, Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Cham: Springer Science and Business Media Deutschland. 2021. p. 5-20. (Lecture Notes in Information Systems and Organisation). doi: 10.1007/978-3-030-86797-3_1

Bibtex

@inbook{98dc91fafcdd4e41b2b7054f2b42c72d,
title = "Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety",
abstract = "Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753. ",
keywords = "Audit digitization, Graph attention networks, Graph-based machine learning, Unstructured data, Informatics, Business informatics",
author = "Felix Krieger and Paul Drews and Burkhardt Funk and Till Wobbe",
note = "Publisher Copyright: {\textcopyright} 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.",
year = "2021",
doi = "10.1007/978-3-030-86797-3_1",
language = "English",
isbn = "978-3-030-86796-6",
series = "Lecture Notes in Information Systems and Organisation",
publisher = "Springer Science and Business Media Deutschland",
pages = "5--20",
editor = "Frederik Ahlemann and Reinhard Sch{\"u}tte and Stefan Stieglitz",
booktitle = "Innovation Through Information Systems - Volume II",
address = "Germany",

}

RIS

TY - CHAP

T1 - Information Extraction from Invoices

T2 - A Graph Neural Network Approach for Datasets with High Layout Variety

AU - Krieger, Felix

AU - Drews, Paul

AU - Funk, Burkhardt

AU - Wobbe, Till

N1 - Publisher Copyright: © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

PY - 2021

Y1 - 2021

N2 - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

AB - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

KW - Audit digitization

KW - Graph attention networks

KW - Graph-based machine learning

KW - Unstructured data

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85118159934&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/02063821-c1f8-3ef1-a187-37439e188ef4/

U2 - 10.1007/978-3-030-86797-3_1

DO - 10.1007/978-3-030-86797-3_1

M3 - Article in conference proceedings

SN - 978-3-030-86796-6

T3 - Lecture Notes in Information Systems and Organisation

SP - 5

EP - 20

BT - Innovation Through Information Systems - Volume II

A2 - Ahlemann, Frederik

A2 - Schütte, Reinhard

A2 - Stieglitz, Stefan

PB - Springer Science and Business Media Deutschland

CY - Cham

ER -

Recently viewed

Publications

  1. The “Fragment on Machines” as Science Fiction; Or, Reading the Grundrisse Politically
  2. A virus of distrust?
  3. Explorations in regional variation
  4. Computer Simulations Then and Now
  5. On the measuring accuracy of the “Vehrs-Hebel”, a scaling apparatus for nonverbal real-time assessment of perceived quantity
  6. Life Cycle Assessment (LCA)
  7. The effect of neighbor species' phylogenetic and trait difference on tree growth in subtropical forests
  8. Lasst es brennen!
  9. § 22 Level Playing Field and Sustainable Development
  10. Psychophysiological Correlates of Flow-Experience
  11. Working group on dry grasslands in the nordic and baltic region - Outline of the project and first results for the class Festuco-Brometea
  12. The technological condition
  13. The impact of weather variability and climate change on pesticide applications in the US - An empirical investigation
  14. The User-Journey in Online Search
  15. How selective are real wage cuts?
  16. Es geht auch anders!
  17. Influence of Ce addition on microstructure and mechanical properties of high pressure die cast AM50 magnesium alloy
  18. Simulationen im Nawi-Unterricht
  19. Political embedding of climate assemblies. How effective strategies for policy impact depend on context
  20. Music Spaces in Conflict
  21. Moderators of the ego depletion effect
  22. Will participation foster the successful implementation of water framework directive?
  23. Oceans and space
  24. ‘Work experience without qualities?’
  25. Art and Culture as an Urban Development Tool
  26. Elution of monomers from three different bonding systems and their antibacterial effect
  27. Games, Fights, Collaborations