Harvesting information from captions for weakly supervised semantic segmentation

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

Original languageEnglish
Title of host publication2019 International Conference on Computer Vision Workshops : ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea
Number of pages10
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date10.2019
Pages4481-4490
Article number9022140
ISBN (print)978-1-7281-5024-6
ISBN (electronic)978-1-7281-5023-9
DOIs
Publication statusPublished - 10.2019
Externally publishedYes
Event17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019 - Seoul, Korea, Republic of
Duration: 27.10.201928.10.2019
Conference number: 17
https://iccv2019.thecvf.com/

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

    Research areas

  • Multimodal learning, Semantic segmentation, Weakly supervised learning, Weakly supervised semantic segmentation
  • Informatics

DOI

Recently viewed

Publications

  1. Analysis And Comparison Of Dispatching RuleBased Scheduling In Dual-Resource Constrained Shop-Floor Scenarios
  2. Taking the pulse of Earth's tropical forests using networks of highly distributed plots
  3. Hierarchical trait filtering at different spatial scales determines beetle assemblages in deadwood
  4. Dynamic Lot Size Optimization with Reinforcement Learning
  5. Use of Machine-Learning Algorithms Based on Text, Audio and Video Data in the Prediction of Anxiety and Post-Traumatic Stress in General and Clinical Populations
  6. Comparison of different FEM codes approach for extrusion process analysis
  7. Towards a spatial understanding of identity play
  8. Global Finite-Time Stabilization of Planar Linear Systems With Actuator Saturation
  9. Effectiveness of a guided multicomponent internet and mobile gratitude training program - A pragmatic randomized controlled trial
  10. Sensor Fusion for Power Line Sensitive Monitoring and Load State Estimation
  11. Clause identification using entropy guided transformation learning
  12. Experimentally established correlation of friction surfacing process temperature and deposit geometry
  13. Constraints are the solution, not the problem
  14. Segment Introduction
  15. Understanding storytelling in the context of information systems
  16. The signal location task as a method quantifying the distribution of attention
  17. Universal Threshold Calculation for Fingerprinting Decoders using Mixture Models
  18. Real-time RDF extraction from unstructured data streams
  19. Age effects on controlling tools with sensorimotor transformations
  20. Supporting the Development and Realization of Data-Driven Business Models with Enterprise Architecture Modeling and Management
  21. Computing regression statistics from grouped data
  22. A localized boundary element method for the floating body problem
  23. On the Decoupling and Output Functional Controllability of Robotic Manipulation
  24. Analysis of PI controllers with anti-windup techniques on level systems
  25. Image compression based on periodic principal components
  26. TRY plant trait database – enhanced coverage and open access
  27. A Review of Latent Variable Modeling Using R - A Step-by-Step-Guide
  28. Knowledge-Enhanced Language Models Are Not Bias-Proof
  29. An Orthogonal Wavelet Denoising Algorithm for Surface Images of Atomic Force Microscopy