Harvesting information from captions for weakly supervised semantic segmentation

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

Original languageEnglish
Title of host publication2019 International Conference on Computer Vision Workshops : ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea
Number of pages10
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date10.2019
Pages4481-4490
Article number9022140
ISBN (print)978-1-7281-5024-6
ISBN (electronic)978-1-7281-5023-9
DOIs
Publication statusPublished - 10.2019
Externally publishedYes
Event17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019 - Seoul, Korea, Republic of
Duration: 27.10.201928.10.2019
Conference number: 17
https://iccv2019.thecvf.com/

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

    Research areas

  • Multimodal learning, Semantic segmentation, Weakly supervised learning, Weakly supervised semantic segmentation
  • Informatics

DOI

Recently viewed

Publications

  1. A computational study of a model of single-crystal strain-gradient viscoplasticity with an interactive hardening relation
  2. Database on Learning for Sustainable Development – analysis of projects
  3. Improving short-term academic performance in the flipped classroom using dynamic geometry software
  4. Understanding storytelling in the context of information systems
  5. Generating Energy Optimal Powertrain Force Trajectories with Dynamic Constraints
  6. Improving students’ science text comprehension through metacognitive self-regulation when applying learning strategies
  7. Robust feedback linearization control of a throttle plate by using an approximated pd regulator
  8. A Lyapunov based PI controller with an anti-windup scheme for a purification process of potable water
  9. Mirrored piezo servo hydraulic actuators for use in camless combustion engines and its Control with mirrored inputs and MPC
  10. A structural property of the wavelet packet transform method to localise incoherency of a signal
  11. (Re-)learning time use and perception for sustainable development in schools – Qualitative results from a self-inquiry-based learning intervention
  12. Cognitive Predictors of Child Second Language Comprehension and Syntactic Learning
  13. Robust Control of Mobile Transportation Object with 3D Technical Vision System
  14. Sliding mode and model predictive control for inverse pendulum
  15. Model predictive control for switching gain adaptation in a sliding mode controller of a DC drive with nonlinear friction
  16. A geometric approach for the design and control of an electromagnetic actuator to optimize its dynamic performance
  17. Return of Fibonacci random walks
  18. Passive Peak Voltage Sensor for Multiple Sending Coils Inductive Power Transmission System
  19. A longitudinal multilevel CFA-MTMM model for interchangeable and structurally different methods
  20. Energy Optimization in Motion Planning of a Two-Link Manipulator using Bernstein Polynomials
  21. Detection time analysis of propulsion system fault effects in a hexacopter
  22. Life satisfaction in Germany after reunification: Additional insights on the pattern of convergence