Harvesting information from captions for weakly supervised semantic segmentation

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

Original languageEnglish
Title of host publication2019 International Conference on Computer Vision Workshops : ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea
Number of pages10
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date10.2019
Pages4481-4490
Article number9022140
ISBN (print)978-1-7281-5024-6
ISBN (electronic)978-1-7281-5023-9
DOIs
Publication statusPublished - 10.2019
Externally publishedYes
Event17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019 - Seoul, Korea, Republic of
Duration: 27.10.201928.10.2019
Conference number: 17
https://iccv2019.thecvf.com/

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

    Research areas

  • Multimodal learning, Semantic segmentation, Weakly supervised learning, Weakly supervised semantic segmentation
  • Informatics

DOI

Recently viewed

Publications

  1. Understanding the socio-technical aspects of low-code adoption for software development
  2. Introduction Mobile Digital Practices. Situating People, Things, and Data
  3. Fast, Fully Automated Analysis of Voriconazole from Serum by LC-LC-ESI-MS-MS with Parallel Column-Switching Technique
  4. Exact and approximate inference for annotating graphs with structural SVMs
  5. Exploration strategies, performance, and error consequences when learning a complex computer task
  6. Lessons learned for spatial modelling of ecosystem services in support of ecosystem accounting
  7. How to support synchronous net-based learning discourses
  8. Construct Objectification and De-Objectification in Organization Theory
  9. Development and validation of a method for the determination of trace alkylphenols and phthalates in the atmosphere
  10. Modeling and numerical simulation of multiscale behavior in polycrystals via extended crystal plasticity
  11. A fast sequential injection analysis system for the simultaneous determination of ammonia and phosphate
  12. Taking the pulse of Earth's tropical forests using networks of highly distributed plots
  13. Backstepping-based Input-Output Linearization of a Peltier Element for Ice Clamping using an Unscented Kalman Filter
  14. A simple nonlinear PD control for faster and high-precision positioning of servomechanisms with actuator saturation
  15. How, when and why do negotiators use reference points?
  16. A lyapunov approach in the derivative approximation using a dynamic system
  17. Hierarchical trait filtering at different spatial scales determines beetle assemblages in deadwood
  18. Transductive support vector machines for structured variables
  19. Training effects of two different unstable shoe constructions on postural control in static and dynamic testing situations
  20. Selecting and Adapting Methods for Analysis and Design in Value-Sensitive Digital Social Innovation Projects: Toward Design Principles
  21. Volume of Imbalance Container Prediction using Kalman Filter and Long Short-Term Memory
  22. Intentionality
  23. Comparison of Odor Thresholds obtained by a Three Alternative Choice Procedure and by the Method of Limits
  24. How does Enterprise Architecture support the Design and Realization of Data-Driven Business Models?
  25. Introducing parametric uncertainty into a nonlinear friction model
  26. The Influence of Note-taking on Mathematical Solution Processes while Working on Reality-Based Tasks
  27. Holistic and scalable ranking of RDF data
  28. Taking notes as a strategy for solving reality-based tasks in mathematics
  29. Lyapunov Convergence Analysis for Asymptotic Tracking Using Forward and Backward Euler Approximation of Discrete Differential Equations