Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Felix Krieger; Paul Drews; Burkhardt Funk

doi:10.1016/j.iswa.2023.200285

Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Research output: Journal contributions › Journal articles › Research › peer-review

Standard

Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers. / Krieger, Felix; Drews, Paul ; Funk, Burkhardt.
In: Intelligent Systems with Applications, Vol. 20, 200285, 01.11.2023.

Research output: Journal contributions › Journal articles › Research › peer-review

Bibtex

@article{75e4eebe29b84ec990facd0d3a84d8cd,

title = "Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers",

abstract = "Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.",

keywords = "Business informatics, Layout-rich documents, Document analysis, Natural language processing",

author = "Felix Krieger and Paul Drews and Burkhardt Funk",

note = "Funding Information: We acknowledge support by the German Research Foundation (DFG). Publisher Copyright: {\textcopyright} 2023 The Authors",

year = "2023",

month = nov,

day = "1",

doi = "10.1016/j.iswa.2023.200285",

language = "English",

volume = "20",

journal = "Intelligent Systems with Applications",

issn = "2667-3053",

publisher = "Elsevier B.V.",

}

RIS

TY - JOUR

T1 - Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

AU - Krieger, Felix

AU - Drews, Paul

AU - Funk, Burkhardt

PY - 2023/11/1

Y1 - 2023/11/1

N2 - Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.

AB - Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.

KW - Business informatics

KW - Layout-rich documents

KW - Document analysis

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85174540281&partnerID=8YFLogxK

U2 - 10.1016/j.iswa.2023.200285

DO - 10.1016/j.iswa.2023.200285

M3 - Journal articles

VL - 20

JO - Intelligent Systems with Applications

JF - Intelligent Systems with Applications

SN - 2667-3053

M1 - 200285

ER -

Other publications by the same author(s)

AI-Enhanced Literature Reviews: Connecting Emerging Phenomena and Bodies of Knowledge

Naqvi, S. A. A., Zimmer, M. P., Kauschinger, M., Drews, P. & Basole, R. C., 2026, (Accepted/In press) Proceedings of HICSS 2026.

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Aligning Experimentation with Product Operations: A Taxonomy for Structuring Experimentation Teams

Stotz, N., Labay, B., Vermeer, L. & Drews, P., 2026, Software Engineering and Advanced Applications - 51st Euromicro Conference, SEAA 2025, Proceedings: 51st Euromicro Conference, SEAA 2025 Salerno, Italy, September 10–12, 2025 Proceedings, Part III. Taibi, D. & Smite, D. (eds.). Cham: Springer Nature Switzerland AG, Vol. 3. p. 23-38 16 p. (Lecture Notes in Computer Science; vol. 16083).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT)

Zainal, N. H., Eckhardt, R., Rackoff, G. N., Fitzsimmons-Craft, E. E., Rojas-Ashe, E., Barr Taylor, C., Funk, B., Eisenberg, D., Wilfley, D. E. & Newman, M. G., 02.04.2025, In: Psychological Medicine. 55, e106.

Research output: Journal contributions › Journal articles › Research › peer-review

Construct relation extraction from scientific papers: Is it automatable yet?

Funk, B. & Scharfenberger, J., 07.01.2025, Proceedings of the 58th Hawaii International Conference on System Sciences, HICSS 2025. Bui, T. X. (ed.). Honolulu: University of Hawaii at Manoa, p. 4675-4684 10 p. (Hawaii International Conference on System Sciences (HICSS); vol. 2025).

Research output: Contributions to collected editions/works › Published abstract in conference proceedings › Research › peer-review

Conveying the Ethics of Artificial Intelligence in K–12 and Academia: A Systematic Review of Teaching Methods

Tschoppe, N. J., Katsarov, J., Drews, P. & Trittin-Ulbrich, H., 07.01.2025, Proceedings of the 58th Hawaii International Conference on System Sciences: Hilton Waikoloa Village, January 7-10, 2025. Bui, T. X. (ed.). Honolulu: University of Hawaii at Manoa, p. 4744-4753 10 p. (Hawaii International Conference on System Sciences (HICSS); vol. 2025).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

DOI

https://doi.org/10.1016/j.iswa.2023.200285
Final published version

Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Standard

Harvard

APA

Vancouver

Bibtex

RIS

Other publications by the same author(s)

AI-Enhanced Literature Reviews: Connecting Emerging Phenomena and Bodies of Knowledge

Aligning Experimentation with Product Operations: A Taxonomy for Structuring Experimentation Teams

Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT)

Construct relation extraction from scientific papers: Is it automatable yet?

Conveying the Ethics of Artificial Intelligence in K–12 and Academia: A Systematic Review of Teaching Methods

DOI

Recently viewed

Activities