Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Authors

Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.
OriginalspracheEnglisch
Aufsatznummer200285
ZeitschriftIntelligent Systems with Applications
Jahrgang20
Anzahl der Seiten14
ISSN2667-3053
DOIs
PublikationsstatusErschienen - 01.11.2023

Bibliographische Notiz

Publisher Copyright:
© 2023 The Authors

DOI

Zuletzt angesehen

Publikationen

  1. Predicate‐based model of problem‐solving for robotic actions planning
  2. Walk counts, labyrinthicity, and complexity of acyclic and cyclic graphs and molecules.
  3. Modeling of lateness distributions depending on the sequencing method with respect to productivity effects
  4. Temperature control in Peltier cells comparing sliding mode control and PID controllers
  5. Design of an Energy Efficient Sensor Node for Wearable Applications
  6. Sensitivity to complexity - an important prerequisite of problem solving mathematics teaching
  7. Challenges for biodiversity monitoring using citizen science in transitioning social-ecological systems
  8. An Outcome-Oriented, Social-Ecological Framework for Assessing Protected Area Effectiveness
  9. Learning shortest paths in word graphs
  10. A Decoupled MPC for Motion Control in Robotino Using a Geometric Approach
  11. Switching Dispatching Rules with Gaussian Processes
  12. A direct test of the similarity assumption — Focusing on differences as compared with similarities decreases automatic imitation
  13. Student Game Design for Language Learning
  14. The Relation of Children's Performances in Spatial Tasks at Two Different Scales of Space
  15. Serendipity as a Mechanism of Change and its Potential for Explaining Change Processes
  16. An empirically grounded ontology for analyzing IT-based interventions in business ecosystems
  17. Internet-based public debate of CCS
  18. Enhancing Community Interactions with Data-Driven Chatbots - The DBpedia Chatbot
  19. Are criminals better lie detectors? Investigating offenders' abilities in the context of deception detection
  20. Comparison of three methods of length compensation in a parallel kinematic and their equivalence conditions
  21. Offline question answering over linked data using limited resources
  22. Phase Shift APOD and POD Control Technique in Multi-Level Inverters to Mitigate Total Harmonic Distortion
  23. Does isolation affect phenotypic variability and fluctuating asymmetry in the endangered Red Apollo?
  24. Short and long-term dominance of negative information in shaping public energy perceptions
  25. Model and Validation of the Electromagnetic Interference Produced by Power Transmission Lines in Robotic Systems
  26. Survey on challenges of Question Answering in the Semantic Web
  27. Nonlinear anisotropic boundary value problems – regularity results and multiscale discretizations
  28. Determining Lot Sizes in Production Areas
  29. Digital Business Transformation and the Changing Role of the IT Function
  30. An antisaturating adaptive preaction and a slide surface to achieve soft landing control for electromagnetic actuators
  31. Toward Data-Driven Analyses of Electronic Text Books