Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.
Original languageEnglish
Article number200285
JournalIntelligent Systems with Applications
Volume20
Number of pages14
ISSN2667-3053
DOIs
Publication statusPublished - 01.11.2023

Bibliographical note

Funding Information:
We acknowledge support by the German Research Foundation (DFG).

Publisher Copyright:
© 2023 The Authors

    Research areas

  • Business informatics - Layout-rich documents, Document analysis, Natural language processing

Recently viewed

Publications

  1. Training effects of two different unstable shoe constructions on postural control in static and dynamic testing situations
  2. Partitioned beta diversity patterns of plants across sharp and distinct boundaries of quartz habitat islands
  3. Modeling Conditional Dependencies in Multiagent Trajectories
  4. An Interactive Layers Model of Self-Regulated Learning and Cognitive Load
  5. The Low-Code Phenomenon: Mapping the Intellectual Structure of Research
  6. Understanding Partnering Strategies in the Low-Code Platform Ecosystem
  7. Duration of Organizational Decision Processes in Organizations in View of Simulation Calculations
  8. Challenges and boundaries in implementing social return on investment
  9. Tree diversity and mycorrhizal type co-determine multitrophic ecosystem functions
  10. Is sensitivity for the complexity of mathematics teaching measurable?
  11. Wozu in Tönen denken?
  12. Formative Perspectives on the Relation Between CSR Communication and CSR Practices
  13. PID Controller Application in a Gimbal Construction for Camera Stabilization and Tracking
  14. Embarrassment as a public vs. private emotion and symbolic coping behaviour
  15. Comparing Empirical Methodologies in Pragmatics
  16. Are Acute Effects of Foam-Rolling Attributed to Dynamic Warm Up Effects? A Comparative Study
  17. Construct- and criterion-related validity of the German Core Self-Evaluations Scale
  18. Neural correlates of the enactment effect in the brain
  19. Mapping Khulan habitats - a GIS based approach.
  20. "Die Arbeit funktioniert"
  21. Student Game Design for Language Learning
  22. Lessons from community-based payment for ecosystem service schemes
  23. Determinants and Outcomes of Dual Distribution:
  24. Guest Editors' Introduction
  25. Vector Fields Autonomous Control for Assistive Mobile Robots
  26. How can problems be turned into something good? The role of entrepreneurial learning and error mastery orientation
  27. Comparison of Backpropagation and Kalman Filter-based Training for Neural Networks
  28. Creative spaces in computer science
  29. Introduction to the challenges and chances regarding the utilization of nitrogen-rich by-products and waste streams