Harvesting information from captions for weakly supervised semantic segmentation

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

Original languageEnglish
Title of host publication2019 International Conference on Computer Vision Workshops : ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea
Number of pages10
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date10.2019
Pages4481-4490
Article number9022140
ISBN (print)978-1-7281-5024-6
ISBN (electronic)978-1-7281-5023-9
DOIs
Publication statusPublished - 10.2019
Externally publishedYes
Event17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019 - Seoul, Korea, Republic of
Duration: 27.10.201928.10.2019
Conference number: 17
https://iccv2019.thecvf.com/

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

    Research areas

  • Multimodal learning, Semantic segmentation, Weakly supervised learning, Weakly supervised semantic segmentation
  • Informatics

DOI

Recently viewed

Publications

  1. Visualization of the Plasma Frequency by means of a Particle Simulation using a Normalized Periodic Model
  2. Analysis And Comparison Of Dispatching RuleBased Scheduling In Dual-Resource Constrained Shop-Floor Scenarios
  3. Exploration strategies, performance, and error consequences when learning a complex computer task
  4. Development and validation of a method for the determination of trace alkylphenols and phthalates in the atmosphere
  5. Dynamic Lot Size Optimization with Reinforcement Learning
  6. Use of Machine-Learning Algorithms Based on Text, Audio and Video Data in the Prediction of Anxiety and Post-Traumatic Stress in General and Clinical Populations
  7. Holistic and scalable ranking of RDF data
  8. Towards a spatial understanding of identity play
  9. Global Finite-Time Stabilization of Planar Linear Systems With Actuator Saturation
  10. Noise level estimation and detection
  11. Interpreting Strings, Weaving Threads
  12. Robust Flatness Based Control of an Electromagnetic Linear Actuator Using Adaptive PID Controller
  13. Investigation and modeling of the material behavior due to evolving dislocation microstructures in fcc and bcc metals
  14. Understanding storytelling in the context of information systems
  15. Analyzing math teacher students' sensitivity for aspects of the complexity of problem oriented mathematics instruction
  16. Real-time RDF extraction from unstructured data streams
  17. “Ideation is Fine, but Execution is Key”
  18. Supporting the Development and Realization of Data-Driven Business Models with Enterprise Architecture Modeling and Management
  19. Considerations on efficient touch interfaces - How display size influences the performance in an applied pointing task
  20. A new way of assessing the interaction of a metallic phase precursor with a modified oxide support substrate as a source of information for predicting metal dispersion
  21. Computing regression statistics from grouped data
  22. Foundations and applications of computer based material flow networks for einvironmental management
  23. Mapping interest rate projections using neural networks under cointegration
  24. Partitioned beta diversity patterns of plants across sharp and distinct boundaries of quartz habitat islands
  25. Analysis of PI controllers with anti-windup techniques on level systems
  26. Using Fuzzy PD Controllers for Soft Motions in a Car-like Robot
  27. An expert-based reference list of variables for characterizing and monitoring social-ecological systems
  28. The fuzzy relationship of intelligence and problem solving in computer simulations
  29. Neural network-based estimation and compensation of friction for enhanced deep drawing process control