Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Standard

Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers. / Krieger, Felix; Drews, Paul; Funk, Burkhardt.
in: Intelligent Systems with Applications, Jahrgang 20, 200285, 01.11.2023.

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Harvard

APA

Vancouver

Bibtex

@article{75e4eebe29b84ec990facd0d3a84d8cd,
title = "Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers",
abstract = "Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.",
keywords = "Business informatics, Layout-rich documents, Document analysis, Natural language processing",
author = "Felix Krieger and Paul Drews and Burkhardt Funk",
note = "Funding Information: We acknowledge support by the German Research Foundation (DFG). Publisher Copyright: {\textcopyright} 2023 The Authors",
year = "2023",
month = nov,
day = "1",
doi = "10.1016/j.iswa.2023.200285",
language = "English",
volume = "20",
journal = "Intelligent Systems with Applications",
issn = "2667-3053",
publisher = "Elsevier B.V.",

}

RIS

TY - JOUR

T1 - Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

AU - Krieger, Felix

AU - Drews, Paul

AU - Funk, Burkhardt

N1 - Funding Information: We acknowledge support by the German Research Foundation (DFG). Publisher Copyright: © 2023 The Authors

PY - 2023/11/1

Y1 - 2023/11/1

N2 - Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.

AB - Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.

KW - Business informatics

KW - Layout-rich documents

KW - Document analysis

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85174540281&partnerID=8YFLogxK

U2 - 10.1016/j.iswa.2023.200285

DO - 10.1016/j.iswa.2023.200285

M3 - Journal articles

VL - 20

JO - Intelligent Systems with Applications

JF - Intelligent Systems with Applications

SN - 2667-3053

M1 - 200285

ER -

DOI

Zuletzt angesehen

Publikationen

  1. Writing as a Deeper Form of Concentration
  2. (Re)productivity
  3. Highlight, Write, Elaborate: Note-Taking Strategies to Master Reality-Based Mathematical Tasks
  4. Quantification of amino acids in fermentation media by isocratic HPLC analysis of their
  5. Intraspecific trait variation patterns along a precipitation gradient in Mongolian rangelands
  6. Multilingual disambiguation of named entities using linked data
  7. An Overview of Electro Hydraulic Full Variable Valve Train Systems to Reduce Emissions in Internal Combustion Engines
  8. Exploring large vegetation databases to detect temporal trends in species occurrences
  9. Quantum computing
  10. Hill–Chao numbers allow decomposing gamma multifunctionality into alpha and beta components
  11. Context-sensitive adjustment of pointing in great apes
  12. GENESIS - A generic RDF data access interface
  13. Explaining implementation deficits through multi-level governance in the EU's new member states
  14. Introduction: Habitual Action, Automaticity, and Control
  15. archiDART: an R package for the automated computation of plant root architectural traits
  16. Treating dialogue quality evaluation as an anomaly detection problem
  17. Simulation and optimization of material and energy flow systems
  18. Duration of Organizational Decision Processes in Organizations in View of Simulation Calculations
  19. The impact of linguistic complexity on the solution of mathematical modelling tasks
  20. Fusion of knowledge bases for better navigation of wheeled mobile robotic group with 3D TVS
  21. Integrating resilience thinking and optimisation for conservation
  22. Automatic Tuning of Extended Kalman Filter in Synchronous Reluctance Motor Drives with a Master-Slave Configuration
  23. Sensorless Control of AC Motor Drives with Adaptive Extended Kalman Filter
  24. The link between in- and external rotation of the auditor and the quality of financial accounting and external audit
  25. Rapid grain refinement and compositional homogenization in a cast binary Cu50Ni alloy achieved by friction stir processing
  26. Stimulating Computing
  27. Anomalous Results in G-Factor Models
  28. Teachers’ temporary support and worked-out examples as elements of scaffolding in mathematical modeling
  29. The effect of structural complexity on large mammal occurrence in revegetation
  30. Introduction