Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.

Original languageEnglish
Title of host publicationInnovation Through Information Systems - Volume II : A Collection of Latest Research on Technology Issues
EditorsFrederik Ahlemann, Reinhard Schütte, Stefan Stieglitz
Number of pages16
Place of PublicationCham
PublisherSpringer Science and Business Media Deutschland
Publication date2021
Pages5-20
ISBN (print)978-3-030-86796-6
ISBN (electronic)978-3-030-86797-3
DOIs
Publication statusPublished - 2021

Bibliographical note

Publisher Copyright:
© 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Recently viewed

Publications

  1. Native vegetation cover thresholds associated with species responses
  2. Endemic predators, invasive prey and native diversity
  3. The informed society - Final report of SAFECOAST action 2
  4. Towards the design of organosilicon compounds for environmental degradation by using structure biodegradability relationships
  5. Local expansion concepts for detecting transport barriers in dynamical systems
  6. The First 50 Contributions to the Data Observer Series - An Overview
  7. Physicochemical properties and biodegradability of organically functionalized colloidal silica particles in aqueous environment
  8. Mapping water ecosystem services: Evaluating InVEST model predictions in data scarce regions
  9. Structure and Organization of Product Development Projects
  10. Special Issue: Habitual Action, Automaticity, and Control
  11. Temporal variability in native plant composition clouds impact of increasing non-native richness along elevational gradients in Tenerife
  12. Mad speculation and absolute inhumanism
  13. Spillover of functionally important organisms between managed and natural habitats
  14. De-Anonymizing Anonymous
  15. Facets of control
  16. Introduction
  17. Datenmodellierung mit dem Entity-Relationship-Ansatz
  18. Biomass energy with carbon capture and storage (BECCS or Bio-CCS)
  19. Proposing a social-ecological framework for successful grassland restoration in Germany—an overview and insights from the Grassworks project
  20. Belief in Free Will Is Related to Internal Attribution in Self-Perception
  21. Discussion on the validity of commonly used reliability indices in sports medicine and exercise science
  22. Trade Dynamics, Trade Costs and Market Size: First Evidence from the Exporter and Importer Dynamics Database for Germany
  23. Release of monomers from four different composite materials after halogen and LED curing
  24. Beschreibung der Hauptergebnisse
  25. Learning from Down Under
  26. Advanced Controlling - eine Ideenskizze
  27. How Big Does Big Data Need to Be?
  28. The Impact of Sample Size on Reliability Metrics Stability in Isokinetic Strength Assessments
  29. Exploring fruitful links between real-world laboratory and disciplinary research Introduction of the DKN Future Earth working group LinkLab