Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Research output: Journal contributionsJournal articlesResearchpeer-review

Standard

Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers. / Krieger, Felix; Drews, Paul; Funk, Burkhardt.
In: Intelligent Systems with Applications, Vol. 20, 200285, 01.11.2023.

Research output: Journal contributionsJournal articlesResearchpeer-review

Harvard

APA

Vancouver

Bibtex

@article{75e4eebe29b84ec990facd0d3a84d8cd,
title = "Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers",
abstract = "Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.",
keywords = "Business informatics, Layout-rich documents, Document analysis, Natural language processing",
author = "Felix Krieger and Paul Drews and Burkhardt Funk",
note = "Funding Information: We acknowledge support by the German Research Foundation (DFG). Publisher Copyright: {\textcopyright} 2023 The Authors",
year = "2023",
month = nov,
day = "1",
doi = "10.1016/j.iswa.2023.200285",
language = "English",
volume = "20",
journal = "Intelligent Systems with Applications",
issn = "2667-3053",
publisher = "Elsevier B.V.",

}

RIS

TY - JOUR

T1 - Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

AU - Krieger, Felix

AU - Drews, Paul

AU - Funk, Burkhardt

N1 - Funding Information: We acknowledge support by the German Research Foundation (DFG). Publisher Copyright: © 2023 The Authors

PY - 2023/11/1

Y1 - 2023/11/1

N2 - Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.

AB - Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.

KW - Business informatics

KW - Layout-rich documents

KW - Document analysis

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85174540281&partnerID=8YFLogxK

U2 - 10.1016/j.iswa.2023.200285

DO - 10.1016/j.iswa.2023.200285

M3 - Journal articles

VL - 20

JO - Intelligent Systems with Applications

JF - Intelligent Systems with Applications

SN - 2667-3053

M1 - 200285

ER -

Recently viewed

Publications

  1. Führt Bürgerbeteiligung in umweltpolitischen Entscheidungsprozessen zu mehr Effektivität und Legitimität?
  2. The language of situated joint activity: Social virtual reality and language learning in virtual exchange
  3. Die kommunale Ebene in Sachsen-Anhalt: Entwicklung und Herausforderungen der Kommunalpolitik seit der Wende
  4. Perspektivenwechsel angesagt. Zur Neuentdeckung des Waldes in der Bildung für eine nachhaltige Entwicklung
  5. Does excess nitrogen supply increase the drought sensitivity of European beech (Fagus sylvatica L.) seedlings?
  6. Comparison of eco-effectiveness and eco-efficiency based criteria for the construction of single-family homes
  7. Szenische Strukturen in den Dramen von Ibsen und Strindberg: Zwischen dem Häuslichen und dem Übernatürlichen
  8. Female-dominated disciplines have lower evaluated research quality and funding success rates, for men and women
  9. School leaders’ innovation-related self-efficacy: professional development and learning networks make a difference
  10. Informationspflichten Privater nach dem neuen Umweltinformationsgesetz am Beispiel der Exportkreditversicherung
  11. Researching Interrelations of formal and informal Learning in early Adolescence form a Critical Race Perspective
  12. Kommentierung von Art. 115 AEUV: Nationales Recht mit unmittelbarer Auswirkung àu den Binnenmarkt, Rechtsangleichung
  13. Need Satisfaction and Optimal Functioning at Leisure and Work: A Longitudinal Validation Study of the DRAMMA Model
  14. Tutorenschulung als Teil der Lehrinnovation in der Studieneingangsphase "Mathematik im Lehramtsstudium" (LIMA-Projekt)
  15. Resettlement as a temporal border: infrastructural promises and future-making among migrants and officials in Niger
  16. Gerhard O. Forde: The Captivation of the Will. Luther vs. Erasmus on Freedom and Bondage, Grand Rapids / Cambridge 2005
  17. Influence of carbon nanoparticle modification on the mechanical and electrical properties of epoxy in small volumes
  18. Humanitäres Theater in unmenschlicher Zeit: Zur Situiertheit von Max Reinhardts Il Mercante di Venezia in Venedig, 1934