Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Standard

Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. / Krieger, Felix; Drews, Paul; Funk, Burkhardt et al.
Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Hrsg. / Frederik Ahlemann; Reinhard Schütte; Stefan Stieglitz. Cham: Springer Science and Business Media Deutschland, 2021. S. 5-20 (Lecture Notes in Information Systems and Organisation; Band 47).

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Harvard

Krieger, F, Drews, P, Funk, B & Wobbe, T 2021, Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. in F Ahlemann, R Schütte & S Stieglitz (Hrsg.), Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Lecture Notes in Information Systems and Organisation, Bd. 47, Springer Science and Business Media Deutschland, Cham, S. 5-20. https://doi.org/10.1007/978-3-030-86797-3_1

APA

Krieger, F., Drews, P., Funk, B., & Wobbe, T. (2021). Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. In F. Ahlemann, R. Schütte, & S. Stieglitz (Hrsg.), Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues (S. 5-20). (Lecture Notes in Information Systems and Organisation; Band 47). Springer Science and Business Media Deutschland. https://doi.org/10.1007/978-3-030-86797-3_1

Vancouver

Krieger F, Drews P, Funk B, Wobbe T. Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. in Ahlemann F, Schütte R, Stieglitz S, Hrsg., Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Cham: Springer Science and Business Media Deutschland. 2021. S. 5-20. (Lecture Notes in Information Systems and Organisation). doi: 10.1007/978-3-030-86797-3_1

Bibtex

@inbook{98dc91fafcdd4e41b2b7054f2b42c72d,
title = "Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety",
abstract = "Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753. ",
keywords = "Audit digitization, Graph attention networks, Graph-based machine learning, Unstructured data, Informatics, Business informatics",
author = "Felix Krieger and Paul Drews and Burkhardt Funk and Till Wobbe",
note = "Publisher Copyright: {\textcopyright} 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.",
year = "2021",
doi = "10.1007/978-3-030-86797-3_1",
language = "English",
isbn = "978-3-030-86796-6",
series = "Lecture Notes in Information Systems and Organisation",
publisher = "Springer Science and Business Media Deutschland",
pages = "5--20",
editor = "Frederik Ahlemann and Reinhard Sch{\"u}tte and Stefan Stieglitz",
booktitle = "Innovation Through Information Systems - Volume II",
address = "Germany",

}

RIS

TY - CHAP

T1 - Information Extraction from Invoices

T2 - A Graph Neural Network Approach for Datasets with High Layout Variety

AU - Krieger, Felix

AU - Drews, Paul

AU - Funk, Burkhardt

AU - Wobbe, Till

N1 - Publisher Copyright: © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

PY - 2021

Y1 - 2021

N2 - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

AB - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

KW - Audit digitization

KW - Graph attention networks

KW - Graph-based machine learning

KW - Unstructured data

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85118159934&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/02063821-c1f8-3ef1-a187-37439e188ef4/

U2 - 10.1007/978-3-030-86797-3_1

DO - 10.1007/978-3-030-86797-3_1

M3 - Article in conference proceedings

SN - 978-3-030-86796-6

T3 - Lecture Notes in Information Systems and Organisation

SP - 5

EP - 20

BT - Innovation Through Information Systems - Volume II

A2 - Ahlemann, Frederik

A2 - Schütte, Reinhard

A2 - Stieglitz, Stefan

PB - Springer Science and Business Media Deutschland

CY - Cham

ER -

Links

DOI

Zuletzt angesehen

Publikationen

  1. Nonviolence as a weapon of the resourceful
  2. Branding the campus
  3. Pivoting the Player
  4. German multiple-product, multiple-destination exporters: Bernard-Redding-Schott under test
  5. Social movements in defense of public water services
  6. Exports, imports and firm survival
  7. Correlates of naturalization and occupancy of introduced ornamentals in Germany
  8. RiB-Kit (RFID-in-a-Box)
  9. Managers’ incentives and disincentives to engage with integrated reporting or why managers might not adopt integrated reporting
  10. Schulpraxis als Lerngelegenheit?
  11. Empirische Arbeit
  12. Modellierung und Implementierung von Geschäftsprozessen in verteilten Systemen
  13. Medienklangräume
  14. Wir sind drin. Zur Gegenwart digitaler Kulturen
  15. Biogeographical patterns in the diet of the wildcat, Felis silvestris Schreber, in Eurasia: Factors affecting the trophic diversity
  16. Energiegenossenschaften im Kontext einer sozial - ökologischen Transformation der Ökonomie
  17. Studierende können lernen, sich selbst zum Lernen zu motivieren
  18. Infrastructures of Extraction in the Smart City Zones, Finance, and Platforms in New Town Kolkata
  19. Sexualitäten, Geschlechter und Identitäten
  20. Carnivores’ contributions to people in Europe
  21. Openness in Cross-Cultural Work Settings
  22. The fantasy of the organizational One
  23. Sustainable Management compensation and ESG performance
  24. The Americas: Evo's Bolivia: Continuity and Change, C. Farthing Linda & H. Kohl Benjamin . Austin, TX: University of Texas Press, 2014. 272pp
  25. Recycling of organic residues to produce insulation composites
  26. A study on the hot deformation behavior of cast Mg-4Sn-2Ca (TX42) alloy