Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Standard

Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. / Krieger, Felix; Drews, Paul; Funk, Burkhardt et al.
Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Hrsg. / Frederik Ahlemann; Reinhard Schütte; Stefan Stieglitz. Cham: Springer Science and Business Media Deutschland, 2021. S. 5-20 (Lecture Notes in Information Systems and Organisation; Band 47).

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Harvard

Krieger, F, Drews, P, Funk, B & Wobbe, T 2021, Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. in F Ahlemann, R Schütte & S Stieglitz (Hrsg.), Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Lecture Notes in Information Systems and Organisation, Bd. 47, Springer Science and Business Media Deutschland, Cham, S. 5-20. https://doi.org/10.1007/978-3-030-86797-3_1

APA

Krieger, F., Drews, P., Funk, B., & Wobbe, T. (2021). Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. In F. Ahlemann, R. Schütte, & S. Stieglitz (Hrsg.), Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues (S. 5-20). (Lecture Notes in Information Systems and Organisation; Band 47). Springer Science and Business Media Deutschland. https://doi.org/10.1007/978-3-030-86797-3_1

Vancouver

Krieger F, Drews P, Funk B, Wobbe T. Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety. in Ahlemann F, Schütte R, Stieglitz S, Hrsg., Innovation Through Information Systems - Volume II: A Collection of Latest Research on Technology Issues. Cham: Springer Science and Business Media Deutschland. 2021. S. 5-20. (Lecture Notes in Information Systems and Organisation). doi: 10.1007/978-3-030-86797-3_1

Bibtex

@inbook{98dc91fafcdd4e41b2b7054f2b42c72d,
title = "Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety",
abstract = "Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753. ",
keywords = "Audit digitization, Graph attention networks, Graph-based machine learning, Unstructured data, Informatics, Business informatics",
author = "Felix Krieger and Paul Drews and Burkhardt Funk and Till Wobbe",
note = "Publisher Copyright: {\textcopyright} 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.",
year = "2021",
doi = "10.1007/978-3-030-86797-3_1",
language = "English",
isbn = "978-3-030-86796-6",
series = "Lecture Notes in Information Systems and Organisation",
publisher = "Springer Science and Business Media Deutschland",
pages = "5--20",
editor = "Frederik Ahlemann and Reinhard Sch{\"u}tte and Stefan Stieglitz",
booktitle = "Innovation Through Information Systems - Volume II",
address = "Germany",

}

RIS

TY - CHAP

T1 - Information Extraction from Invoices

T2 - A Graph Neural Network Approach for Datasets with High Layout Variety

AU - Krieger, Felix

AU - Drews, Paul

AU - Funk, Burkhardt

AU - Wobbe, Till

N1 - Publisher Copyright: © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

PY - 2021

Y1 - 2021

N2 - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

AB - Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

KW - Audit digitization

KW - Graph attention networks

KW - Graph-based machine learning

KW - Unstructured data

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85118159934&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/02063821-c1f8-3ef1-a187-37439e188ef4/

U2 - 10.1007/978-3-030-86797-3_1

DO - 10.1007/978-3-030-86797-3_1

M3 - Article in conference proceedings

SN - 978-3-030-86796-6

T3 - Lecture Notes in Information Systems and Organisation

SP - 5

EP - 20

BT - Innovation Through Information Systems - Volume II

A2 - Ahlemann, Frederik

A2 - Schütte, Reinhard

A2 - Stieglitz, Stefan

PB - Springer Science and Business Media Deutschland

CY - Cham

ER -

Links

DOI

Zuletzt angesehen

Forschende

  1. Tonio Oeftering

Publikationen

  1. Extended Kalman Filter for Temperature Estimation and Control of Peltier Cells in a Novel Industrial Milling Process
  2. Diversion
  3. Mycorrhizal type and tree diversity affect foliar elemental pools and stoichiometry
  4. Influence of carbon nanoparticle modification on the mechanical and electrical properties of epoxy in small volumes
  5. Damages after deregulation
  6. Unequal paths to clienthood
  7. Externalisierung
  8. Neue Rechte und Universität
  9. Movements that matter
  10. Nitrogen losses from fertilizers applied to maize, wheat and rice in the North China Plain
  11. Crack closure mechanisms in residual stress fields generated by laser shock peening
  12. Massenträgheit
  13. Distant regions underpin interregional flows of cultural ecosystem services provided by birds and mammals
  14. Eigenlogik der Städte
  15. Miscellaneous Articles
  16. Variation revisited: A corpus analysis of offers in Irish English and British English
  17. Ownership structure and corporate tax avoidance:
  18. Simulationsbasierte Optimierung der Reihenfolgeplanung am Beispiel eines Liniensorters in der Automobilindustrie
  19. Molnar, Paul D.: Thomas F. Torrance – Theologian of the Trinity. Farnham: Ashgate 2009
  20. International sojourn experience and personality development
  21. Intra-industry adjustment to import competition
  22. Cytotoxicity of the ga-containing coatings on biodegradable magnesium alloys
  23. The face of schadenfreude
  24. In situ tensile texture analysis of a new Mg-RE alloy
  25. Wilhelm Worringer (1881-1965)
  26. Basic investigations in incremental profile forming
  27. Chardin
  28. Students' perspectives on wheelchair basketball in mainstream and special schools
  29. Qualitative environmental risk assessment of photolytic transformation products of iodinated X-ray contrast agent diatrizoic acid
  30. Maintaining a focus on opportunities at work
  31. Home range and habitat use by the pacas (Cuniculus paca) in a montane tropical forest in Bolivia