Harvesting information from captions for weakly supervised semantic segmentation

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

Original languageEnglish
Title of host publication2019 International Conference on Computer Vision Workshops : ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea
Number of pages10
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date10.2019
Pages4481-4490
Article number9022140
ISBN (print)978-1-7281-5024-6
ISBN (electronic)978-1-7281-5023-9
DOIs
Publication statusPublished - 10.2019
Externally publishedYes
Event17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019 - Seoul, Korea, Republic of
Duration: 27.10.201928.10.2019
Conference number: 17
https://iccv2019.thecvf.com/

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

    Research areas

  • Multimodal learning, Semantic segmentation, Weakly supervised learning, Weakly supervised semantic segmentation
  • Informatics

DOI

Recently viewed

Publications

  1. Design of controllers applied to autonomous unmanned aerial vehicles using software in the loop
  2. Closed-form Solution for the Direct Kinematics Problem of the Planar 3-RPR Parallel Mechanism
  3. Lessons learned for spatial modelling of ecosystem services in support of ecosystem accounting
  4. Dynamically adjusting the k-values of the ATCS rule in a flexible flow shop scenario with reinforcement learning
  5. Modeling and numerical simulation of multiscale behavior in polycrystals via extended crystal plasticity
  6. On the origin of passive rotation in rotational joints, and how to calculate it
  7. A fast sequential injection analysis system for the simultaneous determination of ammonia and phosphate
  8. Beyond Path Dependency
  9. Switching Dispatching Rules with Gaussian Processes
  10. Introducing parametric uncertainty into a nonlinear friction model
  11. Database on Learning for Sustainable Development – analysis of projects
  12. Multi-view discriminative sequential learning
  13. Noise level estimation and detection
  14. Combining multiple investigative approaches to unravel functional responses to global change in the understorey of temperate forests
  15. Dispatching rule selection with Gaussian processes
  16. Improving short-term academic performance in the flipped classroom using dynamic geometry software
  17. Homogenization methods for multi-phase elastic composites with non-elliptical reinforcements
  18. Parameters Estimation of a Lotka-Volterra Model in an Application for Market Graphics Processing Units
  19. Understanding storytelling in the context of information systems
  20. The signal location task as a method quantifying the distribution of attention
  21. An analytical approach to evaluating nonmonotonic functions of fuzzy numbers
  22. Mining positional data streams
  23. Generating Energy Optimal Powertrain Force Trajectories with Dynamic Constraints
  24. Universal Threshold Calculation for Fingerprinting Decoders using Mixture Models
  25. Improving students’ science text comprehension through metacognitive self-regulation when applying learning strategies
  26. Computing regression statistics from grouped data
  27. An analytical approach to evaluating bivariate functions of fuzzy numbers with one local extremum
  28. Graphism and Flatness. The Line as Mediator between Time and Space, Intuition and Concept