Harvesting information from captions for weakly supervised semantic segmentation

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

Original languageEnglish
Title of host publication2019 International Conference on Computer Vision Workshops : ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea
Number of pages10
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date10.2019
Pages4481-4490
Article number9022140
ISBN (print)978-1-7281-5024-6
ISBN (electronic)978-1-7281-5023-9
DOIs
Publication statusPublished - 10.2019
Externally publishedYes
Event17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019 - Seoul, Korea, Republic of
Duration: 27.10.201928.10.2019
Conference number: 17
https://iccv2019.thecvf.com/

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

    Research areas

  • Multimodal learning, Semantic segmentation, Weakly supervised learning, Weakly supervised semantic segmentation
  • Informatics

DOI

Recently viewed

Publications

  1. Understanding the socio-technical aspects of low-code adoption for software development
  2. On the Functional Controllability Using a Geometric Approach together with a Decoupled MPC for Motion Control in Robotino
  3. A Review of the Application of Machine Learning and Data Mining Approaches in Continuum Materials Mechanics
  4. Exploration strategies, performance, and error consequences when learning a complex computer task
  5. How to support synchronous net-based learning discourses
  6. Construct Objectification and De-Objectification in Organization Theory
  7. Development and validation of a method for the determination of trace alkylphenols and phthalates in the atmosphere
  8. Taking the pulse of Earth's tropical forests using networks of highly distributed plots
  9. Backstepping-based Input-Output Linearization of a Peltier Element for Ice Clamping using an Unscented Kalman Filter
  10. A Switching Cascade Sliding PID-PID Controllers Combined with a Feedforward and an MPC for an Actuator in Camless Internal Combustion Engines
  11. A lyapunov approach in the derivative approximation using a dynamic system
  12. Measuring cognitive load with subjective rating scales during problem solving
  13. Dynamic Lot Size Optimization with Reinforcement Learning
  14. Volume of Imbalance Container Prediction using Kalman Filter and Long Short-Term Memory
  15. Influence of Process Parameters and Die Design on the Microstructure and Texture Development of Direct Extruded Magnesium Flat Products
  16. Introducing parametric uncertainty into a nonlinear friction model
  17. Scholarly Question Answering Using Large Language Models in the NFDI4DataScience Gateway
  18. The Influence of Note-taking on Mathematical Solution Processes while Working on Reality-Based Tasks
  19. Database on Learning for Sustainable Development – analysis of projects
  20. The role of learners’ memory in app-based language instruction: the case of Duolingo.
  21. Creating regional (e-)learning networks
  22. Towards a spatial understanding of identity play
  23. A Lean Convolutional Neural Network for Vehicle Classification
  24. Effectiveness of a guided multicomponent internet and mobile gratitude training program - A pragmatic randomized controlled trial
  25. Interpreting Strings, Weaving Threads
  26. Robust Flatness Based Control of an Electromagnetic Linear Actuator Using Adaptive PID Controller
  27. Segment Introduction
  28. The signal location task as a method quantifying the distribution of attention
  29. Analyzing math teacher students' sensitivity for aspects of the complexity of problem oriented mathematics instruction
  30. FaST: A linear time stack trace alignment heuristic for crash report deduplication
  31. Towards a Bayesian Student Model for Detecting Decimal Misconceptions
  32. Real-time RDF extraction from unstructured data streams
  33. What does it mean to be sensitive for the complexity of (problem oriented) teaching?
  34. Age effects on controlling tools with sensorimotor transformations
  35. Considerations on efficient touch interfaces - How display size influences the performance in an applied pointing task
  36. An analytical approach to evaluating bivariate functions of fuzzy numbers with one local extremum
  37. Explaining and controlling for the psychometric properties of computer-generated figural matrix items
  38. Foundations and applications of computer based material flow networks for einvironmental management