Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.
Original languageEnglish
Article number200285
JournalIntelligent Systems with Applications
Volume20
Number of pages14
ISSN2667-3053
DOIs
Publication statusPublished - 01.11.2023

Bibliographical note

Funding Information:
We acknowledge support by the German Research Foundation (DFG).

Publisher Copyright:
© 2023 The Authors

    Research areas

  • Business informatics - Layout-rich documents, Document analysis, Natural language processing

Recently viewed

Researchers

  1. Julia Drexhage

Publications

  1. Influence of Process Parameters and Die Design on the Microstructure and Texture Development of Direct Extruded Magnesium Flat Products
  2. A PHENOMENOGRAPHICAL STUDY OF CHILDRENS’ SPATIAL THOUGHT WHILE USING MAPS IN REAL SPACES
  3. Continuous and Discrete Concepts for Detecting Transport Barriers in the Planar Circular Restricted Three Body Problem
  4. Material flow during constrained friction processing and its effects on the local properties of AM50 rods
  5. Assessing Quality of Teaching from Different Perspectives
  6. Covert and overt automatic imitation are correlated
  7. Model-based nonlinear filter design for tower load reduction of wind power plants with active power control capability
  8. Dynamic control of internal force for visco-elastic contact grasps
  9. Modeling and simulation of the microstructural behaviour in thermal sprayed coatings
  10. Introduction to the special issue
  11. Obstacle Coordinates Transformation from TVS Body-Frame to AGV Navigation-Frame
  12. Putting adaptive planning into practice: A meta-analysis of current applications
  13. Control system strategy of a modular omnidirectional AGV
  14. Quality Assurance of Specification - The Users Point of View
  15. BUSINESS MODELS IN BANKING: A CLUSTER ANALYSIS USING ARCHIVAL DATA
  16. Framework for empirical research on science teaching and learning
  17. Mapping industrial patterns in spatial agglomeration
  18. An automated, modular system for organic waste utilization using Hermetia illucens larvae
  19. A microsystem for growth inhibition test of Enterococcus faecalis based on impedance measurement
  20. Discourses for deep transformation
  21. Infelicitous communication or degrees of misunderstanding
  22. Mapping the vegetation of southern mongolian protected areas: application of GIS and remote sensing techniques
  23. Effects of Intermetallic Microstructure on Degradation of Mg-5Nd Alloy
  24. Heterogenous activation of dynamic recrystallization and twinning during friction stir processing of a Cu-4Nb alloy
  25. The Managerial Relevance of Marketing Science: Properties and Genesis
  26. Effect of ECAP Process on the Activation of Deformation Mechanisms During Subsequent Uniaxial Tension of Mg-ZEWK2000 Sheets
  27. Intelligent software system for replacing a force sensor in the case of clearance measurement
  28. Privacy-Preserving Localization and Social Distance Monitoring with Low-Resolution Thermal Imaging and Deep Learning
  29. Species loss due to nutrient addition increases with spatial scale in global grasslands
  30. Exploring Management Control Systems for Biodiversity
  31. Correction to
  32. Compression behaviour of wire + arc additive manufactured structures
  33. Gas-Kampf oder Gas-Krampf
  34. Value creation in post-pandemic retailing
  35. Joint production, externalities, and the regulation of production networks
  36. Introduction: Children's Literature Global and Local
  37. Comprehensive analysis of the forming zone and improvement of diameter reduction prediction in the dieless wire drawing process
  38. BAuA-Arbeitszeitbefragung: Vergleich 2015 – 2017 – 2019
  39. Oxidation Kinetics of Neat Methyl Oleate and as a Blend with Solketal
  40. Music Spaces in Conflict
  41. Moderators of the ego depletion effect
  42. Neuro-Esthetics : mapological foundations and applications (map 2003)