CROCUS: Cluster-based ontology data cleansing

Didier Cherix; Ricardo Usbeck; Andreas Both; Jens Lehmann

CROCUS: Cluster-based ontology data cleansing

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Standard

CROCUS: Cluster-based ontology data cleansing. / Cherix, Didier; Usbeck, Ricardo; Both, Andreas et al.
WaSABi-FEOSW 2014 : Joint Proceedings of WaSABi 2014 and FEOSW 2014. ed. / Angel García-Crespo; Juan Miguel Gómez Berbís; Mateusz Radzimski; José Luis Sánchez Cervantes; Sam Coppens; Karl Hammar; Magnus Knuth; Marco Neumann; Dominique Ritze; Miel Vander Sande. Vol. 1240 Sun Site Central Europe (RWTH Aachen University), 2014. p. 7-14 (CEUR Workshop Proceedings; Vol. 1240).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Harvard

Cherix, D, Usbeck, R, Both, A & Lehmann, J 2014, CROCUS: Cluster-based ontology data cleansing. in A García-Crespo, JM Gómez Berbís, M Radzimski, JLS Cervantes, S Coppens, K Hammar, M Knuth, M Neumann, D Ritze & MV Sande (eds), WaSABi-FEOSW 2014 : Joint Proceedings of WaSABi 2014 and FEOSW 2014. vol. 1240, CEUR Workshop Proceedings, vol. 1240, Sun Site Central Europe (RWTH Aachen University), pp. 7-14, Joint 2nd International Workshop on Semantic Web Enterprise Adoption and Best Practice, WaSABi 2014 and 2nd International Workshop on Finance and Economics on the Semantic Web, FEOSW 2014 - Co-located with 11th European Semantic Web Conference, ESWC 2014, Anissaras, Greece, 26.05.14. <https://ceur-ws.org/Vol-1240/wasabi2014-paper1.pdf>

APA

Cherix, D., Usbeck, R., Both, A., & Lehmann, J. (2014). CROCUS: Cluster-based ontology data cleansing. In A. García-Crespo, J. M. Gómez Berbís, M. Radzimski, J. L. S. Cervantes, S. Coppens, K. Hammar, M. Knuth, M. Neumann, D. Ritze, & M. V. Sande (Eds.), WaSABi-FEOSW 2014 : Joint Proceedings of WaSABi 2014 and FEOSW 2014 (Vol. 1240, pp. 7-14). (CEUR Workshop Proceedings; Vol. 1240). Sun Site Central Europe (RWTH Aachen University). https://ceur-ws.org/Vol-1240/wasabi2014-paper1.pdf

Vancouver

Cherix D, Usbeck R, Both A, Lehmann J. CROCUS: Cluster-based ontology data cleansing. In García-Crespo A, Gómez Berbís JM, Radzimski M, Cervantes JLS, Coppens S, Hammar K, Knuth M, Neumann M, Ritze D, Sande MV, editors, WaSABi-FEOSW 2014 : Joint Proceedings of WaSABi 2014 and FEOSW 2014. Vol. 1240. Sun Site Central Europe (RWTH Aachen University). 2014. p. 7-14. (CEUR Workshop Proceedings).

Bibtex

@inbook{b12dc85b7b0041c1ae3a3efebbfbbc8e,

title = "CROCUS: Cluster-based ontology data cleansing",

abstract = "Over the past years, a vast number of datasets have been published based on Semantic Web standards, which provides an opportunity for creating novel industrial applications. However, industrial requirements on data quality are high while the time to market as well as the required costs for data preparation have to be kept low. Unfortunately, many Linked Data sources are error-prone which prevents their direct use in productive systems. Hence, (semi-)automatic quality assurance processes are needed as manual ontology repair procedures by domain experts are expensive and time consuming. In this article, we present CROCUS - A pipeline for cluster-based ontology data cleansing. Our system provides a semi-automatic approach for instance-level error detection in ontologies which is agnostic of the underlying Linked Data knowledge base and works at very low costs. CROCUS was evaluated on two datasets. The experiments show that we are able to detect errors with high recall.",

keywords = "Informatics, Business informatics",

author = "Didier Cherix and Ricardo Usbeck and Andreas Both and Jens Lehmann",

year = "2014",

language = "English",

volume = "1240",

series = "CEUR Workshop Proceedings",

publisher = "Sun Site Central Europe (RWTH Aachen University)",

pages = "7--14",

editor = "Angel Garc{\'i}a-Crespo and {G{\'o}mez Berb{\'i}s}, {Juan Miguel } and Mateusz Radzimski and Cervantes, {Jos{\'e} Luis S{\'a}nchez} and Sam Coppens and Karl Hammar and Magnus Knuth and Marco Neumann and Dominique Ritze and Sande, {Miel Vander}",

booktitle = "WaSABi-FEOSW 2014",

address = "Germany",

note = "Joint 2nd International Workshop on Semantic Web Enterprise Adoption and Best Practice, WaSABi 2014 and 2nd International Workshop on Finance and Economics on the Semantic Web, FEOSW 2014 - Co-located with 11th European Semantic Web Conference, ESWC 2014 ; Conference date: 26-05-2014",

url = "https://2014.eswc-conferences.org/index.html",

}

RIS

TY - CHAP

T1 - CROCUS

T2 - Joint 2nd International Workshop on Semantic Web Enterprise Adoption and Best Practice, WaSABi 2014 and 2nd International Workshop on Finance and Economics on the Semantic Web, FEOSW 2014 - Co-located with 11th European Semantic Web Conference, ESWC 2014

AU - Cherix, Didier

AU - Usbeck, Ricardo

AU - Both, Andreas

AU - Lehmann, Jens

N1 - Conference code: 11

PY - 2014

Y1 - 2014

N2 - Over the past years, a vast number of datasets have been published based on Semantic Web standards, which provides an opportunity for creating novel industrial applications. However, industrial requirements on data quality are high while the time to market as well as the required costs for data preparation have to be kept low. Unfortunately, many Linked Data sources are error-prone which prevents their direct use in productive systems. Hence, (semi-)automatic quality assurance processes are needed as manual ontology repair procedures by domain experts are expensive and time consuming. In this article, we present CROCUS - A pipeline for cluster-based ontology data cleansing. Our system provides a semi-automatic approach for instance-level error detection in ontologies which is agnostic of the underlying Linked Data knowledge base and works at very low costs. CROCUS was evaluated on two datasets. The experiments show that we are able to detect errors with high recall.

AB - Over the past years, a vast number of datasets have been published based on Semantic Web standards, which provides an opportunity for creating novel industrial applications. However, industrial requirements on data quality are high while the time to market as well as the required costs for data preparation have to be kept low. Unfortunately, many Linked Data sources are error-prone which prevents their direct use in productive systems. Hence, (semi-)automatic quality assurance processes are needed as manual ontology repair procedures by domain experts are expensive and time consuming. In this article, we present CROCUS - A pipeline for cluster-based ontology data cleansing. Our system provides a semi-automatic approach for instance-level error detection in ontologies which is agnostic of the underlying Linked Data knowledge base and works at very low costs. CROCUS was evaluated on two datasets. The experiments show that we are able to detect errors with high recall.

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=84924952819&partnerID=8YFLogxK

M3 - Article in conference proceedings

AN - SCOPUS:84924952819

VL - 1240

T3 - CEUR Workshop Proceedings

SP - 7

EP - 14

BT - WaSABi-FEOSW 2014

A2 - García-Crespo, Angel

A2 - Gómez Berbís, Juan Miguel

A2 - Radzimski, Mateusz

A2 - Cervantes, José Luis Sánchez

A2 - Coppens, Sam

A2 - Hammar, Karl

A2 - Knuth, Magnus

A2 - Neumann, Marco

A2 - Ritze, Dominique

A2 - Sande, Miel Vander

PB - Sun Site Central Europe (RWTH Aachen University)

Y2 - 26 May 2014

ER -

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (ed.). Springer Science and Business Media Deutschland, p. 95-110 16 p. (Lecture Notes in Computer Science; vol. 15836 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Möller, C. & Usbeck, R., 2025, The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025 Portoroz, Slovenia, June 1–5, 2025 Proceedings, Part I. Curry, E., Acosta, M., Poveda-Villalón, M., van Erp, M., Ojo, A., Hose, K., Shimizu, C. & Lisena, P. (eds.). Cham: Springer Nature Switzerland AG, Vol. 1. p. 460-480 21 p. (Lecture Notes in Computer Science ; vol. 15718).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

ASK-DBLP: Answering Questions over DBLP

Taffa, T., Neises, P., Ollinger, S., Westphal, P., Ackermann, M. R., Banerjee, D. & Usbeck, R., 02.11.2025, ISWC-C 2025, Industry, Doctoral Consortium, Posters and Demos at ISWC 2025: Joint Proceedings of Industry, Doctoral Consortium, Posters and Demos of the 24th International Semantic Web Conference (ISWC-C 2025), ISWC 2025 Companion Volume. Celino, I., Hassanzadeh, O., Bernstein, A., Noy, N., Cheng, G., Wang, S., Ferrada, S., Soulard, T., Kozaki, K., Takeda, H. & Gentile, A. L. (eds.). Aachen: Sun Site Central Europe (RWTH Aachen University), p. 435-440 6 p. D13. (CEUR Workshop Proceedings; vol. 4085).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, Linking Meaning: Semantic Technologies Shaping the Future of AI: Cover 74617 Proceedings of the 21st International Conference on Semantic Systems, 3-5 September 2025, Vienna, Austria. Spahiu, B., Vahdati, S., Salatino, A., Pellegrini, T. & Havur, G. (eds.). IOS Press BV, p. 176-193 18 p. (Studies on the Semantic Web; vol. 62).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research

Best Practices in AI and Data Science Models Evaluation

Banerjee, D., Taffa, T. A. & Usbeck, R., 2025, INFORMATIK 2025 : The Wide Open - Offenheit von Source bis Science, 16.-19.September 2025 Potsdam. Lucke, U., Stieglitz, S., Uebernickel, F., Lamprecht, A.-L. & Klein, M. (eds.). Bonn: Gesellschaft für Informatik, Bonn, p. 1211-1219 9 p. (Lecture Notes in Informatics; vol. P366).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review