Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

Original languageEnglish
Title of host publicationInnovation Through Information Systems - Volume II : A Collection of Latest Research on Technology Issues
EditorsFrederik Ahlemann, Reinhard Schütte, Stefan Stieglitz
Number of pages16
Place of PublicationCham
PublisherSpringer Science and Business Media Deutschland
Publication date2021
Pages5-20
ISBN (print)978-3-030-86796-6
ISBN (electronic)978-3-030-86797-3
DOIs
Publication statusPublished - 2021

Bibliographical note

Publisher Copyright:
© 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Recently viewed

Researchers

  1. Yuk Hui

Publications

  1. Integrating indigenous and local knowledge in management and research on coastal ecosystems in the Global South
  2. The bidirectional relationship between ESG performance and earnings management
  3. Probing turbulent superstructures in Rayleigh-Bénard convection by Lagrangian trajectory clusters
  4. Anticipated imitation of multiple agents
  5. Robust Adaptive Soft Landing Control of an Electromagnetic Valve Actuator for Camless Engines
  6. A black box identification in frequency domain
  7. Learning to collaborate while collaborating
  8. Dematerialization
  9. Reiseanalyse 2013:
  10. What a difference a Y makes
  11. Applying standard network analysis to hypermedia systems
  12. The patterns of curriculum change processes that embed sustainability in higher education institutions
  13. Schreiben Englisch
  14. The effect of neighbor species' phylogenetic and trait difference on tree growth in subtropical forests
  15. Special issue: Frameworks for Sustainability Management
  16. Tackling the knowledge-action gap in sustainable consumption
  17. Modelling and simulation of dynamic microstructure evolution of aluminium alloys during thermomechanically coupled extrusion process
  18. High-Volume Resistance Training Improves Double-Poling Peak Oxygen Uptake in Youth Elite Cross-Country Skiers and Biathletes
  19. Machine Learning-Supported Planning of Lead Times in Job Shop Manufacturing
  20. The Right to Liberty and Security, Public Health and Disease Control
  21. How to Limit the Spillover from the 2021 Inflation Surge to Inflation Expectations?
  22. The China puzzle
  23. On the Problems of Honorary Work in German Sports Clubs – A Qualitative-Dominated Crossover Mixed Methods Study