Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.
Original languageEnglish
Article number200285
JournalIntelligent Systems with Applications
Volume20
Number of pages14
ISSN2667-3053
DOIs
Publication statusPublished - 01.11.2023

Bibliographical note

Funding Information:
We acknowledge support by the German Research Foundation (DFG).

Publisher Copyright:
© 2023 The Authors

    Research areas

  • Business informatics - Layout-rich documents, Document analysis, Natural language processing

Recently viewed

Activities

  1. Identification in closed loop
  2. Alterations of a visual and how they work for and at the boundaries of an interorganizational team: A multimodal exploration
  3. Coding feedback in an online- and video-based learning environment during a field experience
  4. Assessing cognitive abilities for L2 learning: A review of accessible tools
  5. diffractions and the (un-)making of difference - 2020
  6. Connecting Patterns: Sustainability and the Culture of Complexity
  7. 27th International Conference on System Theory, Control and Computing - ICSTCC 2023
  8. GET.ON PAPP: Feasibility of a mobile application for panic with and without agoraphobia
  9. Judgement Practices in the Artistic Field
  10. How stereotypes affect grading and tutorial feedback: Shifting evaluations or shifting standards?
  11. Swarming. Science Fact and Science Fiction of Distributed Intelligence
  12. Towards a fully-automated adaptive e-learning environment: A predictive model for difficulty generating factors in gap-filling activities that target English tense-aspect-mood
  13. The global classroom. Introduction, presenation and workshop: Introduction, presenation and workshop
  14. Activating an Integrative Mindset Improves the Subjective Outcomes of Value-Driven Conflicts
  15. HyperKult IX - Computer als Medium: Augmented Space 2000
  16. Towards a fully-automated adaptive e-learning environment: A predictive model for difficulty generating factors in gap-filling activities that target English tense-aspect-mood
  17. Tracking, Targeting, Predicting: Epistemological, Ontological and Biopolitical Dimensions of Techno-Security - 2013

Publications

  1. On the Functional Controllability Using a Geometric Approach together with a Decoupled MPC for Motion Control in Robotino
  2. Nonlinear PD fault-tolerant control for dynamic positioning of ships with actuator constraints
  3. Robust decoupling through algebraic output feedback in manipulation systems
  4. Changes in the Complexity of Limb Movements during the First Year of Life across Different Tasks
  5. Universal Threshold Calculation for Fingerprinting Decoders using Mixture Models
  6. Introducing split orders and optimizing operational policies in robotic mobile fulfillment systems
  7. Can measurement errors explain variance in the relationship between muscle- and tendon stiffness and range of motion?—a blinded reliability and objectivity study
  8. A statistical study of the spatial evolution of shock acceleration efficiency for 5 MeV protons and subsequent particle propagation
  9. Identification of conductive fiber parameters with transcutaneous electrical nerve stimulation signal using RLS algorithm
  10. A Lyapunov based PI controller with an anti-windup scheme for a purification process of potable water
  11. Age effects on controlling tools with sensorimotor transformations
  12. A Study on the Performance of Adaptive Neural Networks for Haze Reduction with a Focus on Precision
  13. Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability
  14. A PHENOMENOGRAPHICAL STUDY OF CHILDRENS’ SPATIAL THOUGHT WHILE USING MAPS IN REAL SPACES
  15. Practice and carryover effects when using small interaction devices
  16. Machine Learning and Knowledge Discovery in Databases
  17. Evaluating structural and compositional canopy characteristics to predict the light-demand signature of the forest understorey in mixed, semi-natural temperate forests
  18. A cascade controller structure using an internal PID controller for a hybrid piezo-hydraulic actuator in camless internal combustion engines
  19. Continuous and Discrete Concepts for Detecting Transport Barriers in the Planar Circular Restricted Three Body Problem
  20. A Hermeneutic Interpretation of Concepts in a Cooperative Multicultural Working Project
  21. Neural relational inference for disaster multimedia retrieval
  22. The relationship between audit committees, external auditors, and internal control systems
  23. Automatic feature selection for anomaly detection
  24. Performance incentives in activity-based management
  25. Detection of coherent oceanic structures via transfer operators
  26. Challenges and boundaries in implementing social return on investment
  27. Determination of 10 particle-associated multiclass polar and semi-polar pesticides from small streams using accelerated solvent extraction