Harvesting information from captions for weakly supervised semantic segmentation

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Harvesting information from captions for weakly supervised semantic segmentation. / Sawatzky, Johann; Banerjee, Debayan; Gall, Juergen.
2019 International Conference on Computer Vision Workshops: ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea. Piscataway: Institute of Electrical and Electronics Engineers Inc., 2019. p. 4481-4490 9022140 (IEEE International Conference on Computer Vision workshops; Vol. 2019).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Sawatzky, J, Banerjee, D & Gall, J 2019, Harvesting information from captions for weakly supervised semantic segmentation. in 2019 International Conference on Computer Vision Workshops: ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea., 9022140, IEEE International Conference on Computer Vision workshops, vol. 2019, Institute of Electrical and Electronics Engineers Inc., Piscataway, pp. 4481-4490, 17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019, Seoul, Korea, Republic of, 27.10.19. https://doi.org/10.1109/ICCVW.2019.00549

APA

Sawatzky, J., Banerjee, D., & Gall, J. (2019). Harvesting information from captions for weakly supervised semantic segmentation. In 2019 International Conference on Computer Vision Workshops: ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea (pp. 4481-4490). Article 9022140 (IEEE International Conference on Computer Vision workshops; Vol. 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCVW.2019.00549

Vancouver

Sawatzky J, Banerjee D, Gall J. Harvesting information from captions for weakly supervised semantic segmentation. In 2019 International Conference on Computer Vision Workshops: ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea. Piscataway: Institute of Electrical and Electronics Engineers Inc. 2019. p. 4481-4490. 9022140. (IEEE International Conference on Computer Vision workshops). doi: 10.1109/ICCVW.2019.00549

Bibtex

@inbook{13c2379a3a944f5bacd91e0409b3aeca,
title = "Harvesting information from captions for weakly supervised semantic segmentation",
abstract = "Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.",
keywords = "Multimodal learning, Semantic segmentation, Weakly supervised learning, Weakly supervised semantic segmentation, Informatics",
author = "Johann Sawatzky and Debayan Banerjee and Juergen Gall",
note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019, ICCVW 2019 ; Conference date: 27-10-2019 Through 28-10-2019",
year = "2019",
month = oct,
doi = "10.1109/ICCVW.2019.00549",
language = "English",
isbn = "978-1-7281-5024-6",
series = "IEEE International Conference on Computer Vision workshops",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "4481--4490",
booktitle = "2019 International Conference on Computer Vision Workshops",
address = "United States",
url = "https://iccv2019.thecvf.com/",

}

RIS

TY - CHAP

T1 - Harvesting information from captions for weakly supervised semantic segmentation

AU - Sawatzky, Johann

AU - Banerjee, Debayan

AU - Gall, Juergen

N1 - Conference code: 17

PY - 2019/10

Y1 - 2019/10

N2 - Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

AB - Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

KW - Multimodal learning

KW - Semantic segmentation

KW - Weakly supervised learning

KW - Weakly supervised semantic segmentation

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=85082499279&partnerID=8YFLogxK

U2 - 10.1109/ICCVW.2019.00549

DO - 10.1109/ICCVW.2019.00549

M3 - Article in conference proceedings

AN - SCOPUS:85082499279

SN - 978-1-7281-5024-6

T3 - IEEE International Conference on Computer Vision workshops

SP - 4481

EP - 4490

BT - 2019 International Conference on Computer Vision Workshops

PB - Institute of Electrical and Electronics Engineers Inc.

CY - Piscataway

T2 - 17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019

Y2 - 27 October 2019 through 28 October 2019

ER -

DOI

Recently viewed

Publications

  1. Towards an open question answering architecture
  2. Reading Comprehension as Embodied Action: Exploratory Findings on Nonlinear Eye Movement Dynamics and Comprehension of Scientific Texts
  3. New Labor, Old Questions: Practices of Collaboration with Robots
  4. Getting down to specifics on RCA [Resource Consumption Accounting]
  5. Visualization of the Plasma Frequency by means of a Particle Simulation using a Normalized Periodic Model
  6. Challenges in detecting proximal effects of existential threat on lie detection accuracy
  7. The professional context as a predictor for response distortion in the Adaption-Innovation-Inventory – An investigation using mixture-distribution item-response theory models
  8. Industrial applications using wavelet packets for gross error detection
  9. Introduction
  10. Systematic feature evaluation for gene name recognition
  11. A cognitive mapping approach to understanding public objection to energy infrastructure
  12. Public Value: rethinking value creation
  13. Predicate‐based model of problem‐solving for robotic actions planning
  14. Octanol-Water Partition Coefficient Measurement by a Simple 1H NMR Method
  15. Approximate tree kernels
  16. Mathematical Modeling for Robot 3D Laser Scanning in Complete Darkness Environments to Advance Pipeline Inspection
  17. Introducing split orders and optimizing operational policies in robotic mobile fulfillment systems
  18. Metrics for Experimentation Programs: Categories, Benefits and Challenges
  19. Scholarly Question Answering Using Large Language Models in the NFDI4DataScience Gateway
  20. Application of design of experiments for laser shock peening process optimization
  21. A survey of empirical studies using transaction level data on exports and imports
  22. A Wavelet Packet Algorithm for Online Detection of Pantograph Vibrations
  23. Processing of CSR communication: insights from the ELM
  24. Experimentally established correlation of friction surfacing process temperature and deposit geometry
  25. Guest Editorial - ''Econometrics of Anonymized Micro Data''
  26. Performance Saga: Interview 01
  27. Active learning for network intrusion detection
  28. A Lyapunov based PI controller with an anti-windup scheme for a purification process of potable water
  29. Embarrassment as a public vs. private emotion and symbolic coping behaviour
  30. Intraspecific trait variation increases species diversity in a trait-based grassland model
  31. »HOW TO MAKE YOUR OWN SAMPLES«
  32. Imaginary practices as the nexus between continuity and disruptive change
  33. Polar Coordinates and Interactive Learning