Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Authors

Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.
OriginalspracheEnglisch
Aufsatznummer200285
ZeitschriftIntelligent Systems with Applications
Jahrgang20
Anzahl der Seiten14
ISSN2667-3053
DOIs
PublikationsstatusErschienen - 01.11.2023

Bibliographische Notiz

Publisher Copyright:
© 2023 The Authors

DOI

Zuletzt angesehen

Publikationen

  1. How alloying and processing effects can influence the microstructure and mechanical properties of directly extruded thin zinc wires
  2. Value Structure and Dimensions
  3. Conceptual understanding of complex components and Nyquist-Shannon sampling theorem
  4. Nonlinear PD fault-tolerant control for dynamic positioning of ships with actuator constraints
  5. Predicate‐based model of problem‐solving for robotic actions planning
  6. Homogenization methods for multi-phase elastic composites with non-elliptical reinforcements
  7. The role of task complexity, modality and aptitude in narrative task performance
  8. Factored MDPs for detecting topics of user sessions
  9. Inside-sediment partitioning of PAH, PCB and organochlorine compounds and inferences on sampling and normalization methods
  10. Privatizing the commons
  11. A tutorial introduction to adaptive fractal analysis
  12. Concepts, Formats, and Methods of Participation
  13. Mining Implications From Data
  14. Octanol-Water Partition Coefficient Measurement by a Simple 1H NMR Method
  15. New method for assessing the repeatability of the measuring system for roughness measurements
  16. Artificial intelligence
  17. Early Detection of Faillure in Conveyor Chain Systems by Wireless Sensor Node
  18. Changing Data Collection Methods Means Different Kind of Data
  19. Trait-based approaches to analyze links between the drivers of change and ecosystem services
  20. Design, Modeling and Control of an Over-actuated Hexacopter Tilt-Rotor
  21. A framework for business model development in technology-driven start-ups
  22. Control system strategy of a modular omnidirectional AGV
  23. Monitoring of microbially mediated corrosion and scaling processes using redox potential measurements
  24. Comparison of Bio-Inspired Algorithms in a Case Study for Optimizing Capacitor Bank Allocation in Electrical Power Distribution
  25. Managing complexity in automative production
  26. Metrics for Experimentation Programs: Categories, Benefits and Challenges
  27. Robust Control of Excavation Mobile Robot with Dynamic Triangulation Vision
  28. Scholarly Question Answering Using Large Language Models in the NFDI4DataScience Gateway