Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.
Original languageEnglish
Article number200285
JournalIntelligent Systems with Applications
Volume20
Number of pages14
ISSN2667-3053
DOIs
Publication statusPublished - 01.11.2023

Bibliographical note

Funding Information:
We acknowledge support by the German Research Foundation (DFG).

Publisher Copyright:
© 2023 The Authors

    Research areas

  • Business informatics - Layout-rich documents, Document analysis, Natural language processing

Recently viewed

Publications

  1. Development and application of a laboratory flux measurement system (LFMS) for the investigation of the kinetics of mercury emissions from soils
  2. A Comparative Study for Fisheye Image Classification
  3. Reality-Based Tasks with Complex-Situations
  4. Applying the Three Horizons approach in local and regional scenarios to support policy coherence in SDG implementation
  5. Predictors of adherence to public health behaviors for fighting COVID-19 derived from longitudinal data
  6. Comparability of lcas — review and discussion of the application purpose
  7. One planet
  8. Effects on the (CSR) Reputation
  9. Evidence-Based Management
  10. Competition between honey bees and wild bees and the role of nesting resources in a nature reserve
  11. Towards a bridging concept for undesirable resilience in social-ecological systems
  12. Three steps to a solar system
  13. Adaptive Environments
  14. Introduction to Music and the Politics of Memory
  15. Die coatings influence evaluation and friction model selection in aluminium extrusion by means of visioplasticity technique
  16. Introduction: Toward a business administration for the 21st century
  17. Operationalising the leverage points perspective for empirical research
  18. 'KNOW WHY' thinking as a new approach to systems thinking
  19. Strategies, uncertainty and performance of small business startups
  20. "When in Rome, do as the Romans do?"
  21. A trust inoculation to protect public support of governmentally mandated actions to mitigate climate change
  22. Lernsoftware im Unterricht
  23. Zur internen Repräsentation von Umweltgeräuschen
  24. Temperature-dependent mechanical behavior of aluminum AM structures generated via multi-layer friction surfacing
  25. The balanced scorecard’s missing link to compensation
  26. Automated text analyses of sustainability & integrated reporting.
  27. Defined mechanochemical reductive dechlorination of 1,3,5-trichlorobenzene at room temperature in a ball mill