Information Extraction from Invoices: A Graph Neural Network Approach for Datasets with High Layout Variety
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Authors
Extracting information from invoices is a highly structured, recurrent task in auditing. Automating this task would yield efficiency improvements, while simultaneously improving audit quality. The challenge for this endeavor is to account for the text layout on invoices and the high variety of layouts across different issuers. Recent research has proposed graphs to structurally represent the layout on invoices and to apply graph convolutional networks to extract the information pieces of interest. However, the effectiveness of graph-based approaches has so far been shown only on datasets with a low variety of invoice layouts. In this paper, we introduce a graph-based approach to information extraction from invoices and apply it to a dataset of invoices from multiple vendors. We show that our proposed model extracts the specified key items from a highly diverse set of invoices with a macro F 1 score of 0.8753.
Original language | English |
---|---|
Title of host publication | Innovation Through Information Systems - Volume II : A Collection of Latest Research on Technology Issues |
Editors | Frederik Ahlemann, Reinhard Schütte, Stefan Stieglitz |
Number of pages | 16 |
Place of Publication | Cham |
Publisher | Springer Science and Business Media Deutschland GmbH |
Publication date | 2021 |
Pages | 5-20 |
ISBN (print) | 978-3-030-86796-6 |
ISBN (electronic) | 978-3-030-86797-3 |
DOIs | |
Publication status | Published - 2021 |
Bibliographical note
Publisher Copyright:
© 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.
- Audit digitization, Graph attention networks, Graph-based machine learning, Unstructured data
- Informatics
- Business informatics