Harvesting information from captions for weakly supervised semantic segmentation

Johann Sawatzky; Debayan Banerjee; Juergen Gall

doi:10.1109/ICCVW.2019.00549

Harvesting information from captions for weakly supervised semantic segmentation

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Authors

Johann Sawatzky
Debayan Banerjee
Juergen Gall

Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

Original language	English
Title of host publication	2019 International Conference on Computer Vision Workshops : ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea
Number of pages	10
Place of Publication	Piscataway
Publisher	Institute of Electrical and Electronics Engineers Inc.
Publication date	10.2019
Pages	4481-4490
Article number	9022140
ISBN (print)	978-1-7281-5024-6
ISBN (electronic)	978-1-7281-5023-9
DOIs	https://doi.org/10.1109/ICCVW.2019.00549
Publication status	Published - 10.2019
Externally published	Yes
Event	17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019 - Seoul, Korea, Republic of Duration: 27.10.2019 → 28.10.2019 Conference number: 17 https://iccv2019.thecvf.com/

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Research areas

Multimodal learning, Semantic segmentation, Weakly supervised learning, Weakly supervised semantic segmentation
Informatics

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (ed.). Springer Science and Business Media Deutschland, p. 95-110 16 p. (Lecture Notes in Computer Science; vol. 15836 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

ASK-DBLP: Answering Questions over DBLP

Taffa, T., Neises, P., Ollinger, S., Westphal, P., Ackermann, M. R., Banerjee, D. & Usbeck, R., 02.11.2025, ISWC-C 2025, Industry, Doctoral Consortium, Posters and Demos at ISWC 2025: Joint Proceedings of Industry, Doctoral Consortium, Posters and Demos of the 24th International Semantic Web Conference (ISWC-C 2025), ISWC 2025 Companion Volume. Celino, I., Hassanzadeh, O., Bernstein, A., Noy, N., Cheng, G., Wang, S., Ferrada, S., Soulard, T., Kozaki, K., Takeda, H. & Gentile, A. L. (eds.). Aachen: Sun Site Central Europe (RWTH Aachen University), p. 435-440 6 p. D13. (CEUR Workshop Proceedings; vol. 4085).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, Linking Meaning: Semantic Technologies Shaping the Future of AI: Cover 74617 Proceedings of the 21st International Conference on Semantic Systems, 3-5 September 2025, Vienna, Austria. Spahiu, B., Vahdati, S., Salatino, A., Pellegrini, T. & Havur, G. (eds.). IOS Press BV, p. 176-193 18 p. (Studies on the Semantic Web; vol. 62).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research

Best Practices in AI and Data Science Models Evaluation

Banerjee, D., Taffa, T. A. & Usbeck, R., 2025, INFORMATIK 2025 : The Wide Open - Offenheit von Source bis Science, 16.-19.September 2025 Potsdam. Lucke, U., Stieglitz, S., Uebernickel, F., Lamprecht, A.-L. & Klein, M. (eds.). Bonn: Gesellschaft für Informatik, Bonn, p. 1211-1219 9 p. (Lecture Notes in Informatics; vol. P366).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

DBLPLink 2.0 - An Entity Linker for the DBLP Scholarly Knowledge Graph

Banerjee, D., Taffa, T. A. & Usbeck, R., 30.07.2025, Industry, Doctoral Consortium, Posters and Demos at ISWC 2025: Joint Proceedings of Industry, Doctoral Consortium, Posters and Demos of the 24th International Semantic Web Conference (ISWC-C 2025) Nara, Japan, November 2-6, 2025.. Celino, I., Hassanzadeh, O., Bernstein, A., Noy, N., Cheng, G., Wang, S., Ferrada, S., Kozaki, T., Takeda, H. & Gentile, A. L. (eds.). Aachen: Sun Site Central Europe (RWTH Aachen University), Vol. 4085. p. 435-440 6 p. (CEUR Workshop Proceedings; vol. 4085).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

DOI

https://doi.org/10.1109/ICCVW.2019.00549
Final published version

Harvesting information from captions for weakly supervised semantic segmentation

Authors

Bibliographical note

Research areas

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

ASK-DBLP: Answering Questions over DBLP

Automating SPARQL Query Translations between DBpedia and Wikidata

Best Practices in AI and Data Science Models Evaluation

DBLPLink 2.0 - An Entity Linker for the DBLP Scholarly Knowledge Graph

DOI

Recently viewed

Activities

Publications