Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

Automation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.
Original languageEnglish
Article number200285
JournalIntelligent Systems with Applications
Volume20
Number of pages14
ISSN2667-3053
DOIs
Publication statusPublished - 01.11.2023

Bibliographical note

Funding Information:
We acknowledge support by the German Research Foundation (DFG).

Publisher Copyright:
© 2023 The Authors

    Research areas

  • Business informatics - Layout-rich documents, Document analysis, Natural language processing

Recently viewed

Publications

  1. Application of Adaptive Element-Free Galerkin Method to Simulate Friction Stir Welding of Aluminum
  2. Women's agency and resistance in Russia's war on Ukraine: From victim of the war to prominent force
  3. Rezension zu Christoph Weischer: Sozialforschung. UVK Verlagsgesellschaft (Konstanz) 2007. 415 Seiten
  4. Online-Trainings zur Stressbewältigung - eine neue Chance zur Gesundheitsförderung im Lehrerberuf?
  5. On the role of linguistic features for comprehension and learning from STEM texts. A meta-analysis
  6. Law versus Economics? How should insurance intermediaries influence the insurance demand decision.
  7. Grenzüberschreitende Hunde- und Hauskatzentransporte innerhalb der EU durch sog. Flugpatenschaften
  8. Evaluating complex mental health care in outpatient care in rural Lower Saxony: an ecologic study
  9. Der (Konzern-)Lagebericht als strategisches Kommunikationsinstrument für das Value Based Management
  10. Vergleichende Regierungslehre: institutionelle Bedingungen des Regierens im demokratischen Staat
  11. Random year intercepts in mixed models help to assess uncertainties in insect population trends
  12. Prüfung von Nachhaltigkeitsberichten nach der Corporate Sustainability Reporting Directive (CSRD)
  13. Objektivierung von Naturschutzbewertungen - Das Beispiel Roter Listen von Pflanzengesellschaften
  14. Key features for more successful place-based sustainability research on social-ecological systems
  15. Implizite sicherheitskritische Einstellungen als integraler Bestandteil einer Sicherheitskultur
  16. Evolutionary clustering of Lagrangian trajectories in turbulent Rayleigh-Bénard convection flows
  17. E.J. SUITS: Developing a global fashion firm through an international production and sales network
  18. Always on Call: Is There an Age Advantage in Dealing with Availability and Response Expectations?
  19. Uncovered workers in plants covered by collective bargaining: Who are they and how do they fare?
  20. Learner pragmatics at the discourse level: Staying “on topic” in a telecollaborative eTandem task
  21. Digital Workplace Transformation: Subtraction Logic as Deinstitutionalising the Taken-for-Granted
  22. Clinical and functional outcome of assertive outreach for patients with schizophrenic disorder
  23. Can rare arable plants benefit biological pest control potential of cereal aphids in croplands?
  24. Blick über den Zaun: Leseförderung in Europa. Ergebnisse und Einsichten aus dem EU-Projekt ADORE
  25. Visualizing stakeholders’ willingness for collective action in participatory scenario planning
  26. Validierung des Adaptiven Tests des Emotionswissens für drei- bis neunjährige Kinder (ATEM 3 - 9)
  27. Science on ecosystems and people to support the Kunming-Montreal Global Biodiversity Framework
  28. Prediction of the tool change point in a polishing process using a modular software framework
  29. Personenbeförderungsgesetz; Lauterkeitsrecht; Rechtsbruchtatbestand; Mobilitätsdienstleistungen
  30. Können Corporate-Governance-Variablen die Qualität der Vergütungsberichte positiv beeinflussen?
  31. I share because of who I am: values, identities, norms, and attitudes explain sharing intentions
  32. Funciones agroecológicas de los nichos de agrobiodiversidad en la ruralidad de Bogotá, Colombia
  33. Experiential marketing as a tool to enhance Tourists’ pre-travel online destination experiences?
  34. Entrepreneurs and Freelancers: Are They Time and Income Multidimensional Poor? - The German Case
  35. Entertainment Education as a Means to Reduce Anti-Muslim Prejudice - For Whom Does It Work Best?
  36. Aktuelle Trends und Herausforderungen in der schulischen Prävention und Gesundheitsförderung
  37. Zur Vereinbarkeit rationalistischer und kulturalistischer Ansätze in der Politikwissenschaft
  38. The granular nature of the great export collapse in German manufacturing industries, 2008/2009
  39. Regionale Weiterbildung – Chance oder Sackgasse für Frauen in Sozialen und Gesundheitsberufen
  40. Reducing Environmental Pollution by Antibiotics through Design for Environmental Degradation
  41. Prospective Relations Between Adolescents' Social-emotional Competencies and Their Friendships
  42. Lexikalische Erwerbsstrategien auf der Basis primärsprachenunterrichtlicher Wortschatzarbeit
  43. Lässt sich Sensibilität für die Komplexität problemorientierten Mathematikunterrichts "messen"?
  44. Impacts of Multiple Environmental Change Drivers on Growth of European Beech (Fagus sylvatica)
  45. “I’ll Worry About It Tomorrow” – Fostering Emotion Regulation Skills to Overcome Procrastination
  46. A comprehensive method for determination of fatty acids in the initial oral biofilm (pellicle)
  47. Zum Demokratiedefizit in der Sozialen Arbeit mit ehemals rechtsorganisierten Rechtsextremen