Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Authors

Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.
OriginalspracheEnglisch
Aufsatznummer200285
ZeitschriftIntelligent Systems with Applications
Jahrgang20
Anzahl der Seiten14
ISSN2667-3053
DOIs
PublikationsstatusErschienen - 01.11.2023

Bibliographische Notiz

Publisher Copyright:
© 2023 The Authors

DOI

Zuletzt angesehen

Publikationen

  1. Passive Rotation Compensation in Parallel Kinematics Using Quaternions
  2. Facing complexity through informed simplifications
  3. Perfectly nested or significantly nested - an important difference for conservation management
  4. Critical look at dynamic sketches when learning mathematics
  5. Serendipity as a Mechanism of Change and its Potential for Explaining Change Processes
  6. Are Acute Effects of Foam-Rolling Attributed to Dynamic Warm Up Effects? A Comparative Study
  7. Interactive Media as Fields of Transduction
  8. Gluing life together. Computer simulation in the life sciences
  9. I share because of who I am: values, identities, norms, and attitudes explain sharing intentions
  10. Lessons from modeling 100% renewable scenarios using GENeSYS-MOD
  11. Finite element based determination and optimization of seam weld positions in porthole die extrusion of double hollow profile with asymmetric cross section
  12. Modeling Interactions and Dependencies in Production Planning and Control
  13. Non-technical success factors for bioenergy projects-Learning from a multiple case study in Japan
  14. Encoding the law of State responsibility with courage and resolve
  15. A new method for collecting agile tiger beetles by live pitfall trapping
  16. Compression behavior of typical silicone rubbers for soft robotics applications at elevated temperatures
  17. Exploring intrinsic, instrumental and relational values for sustainable management of social-ecological systems
  18. What do people do when they use the internet?
  19. On the Existence of Digital Objects
  20. Guest editorial
  21. Assessment of occupational exertion and strain in laboratory- and real occupational environments
  22. Toward Automatically Labeling Situations in Soccer
  23. The Effects of Nonindependent Rater Sets in Multilevel–Multitrait–Multimethod Models
  24. Hermann Stutte
  25. Introduction to the basics of life cycle sustainability assessment focusing on the UNEP/SETAC Life Cycle Initiative LCSA framework
  26. The effectiveness of nudging
  27. Assembly history modulates vertical root distribution in a grassland experiment