Harvesting information from captions for weakly supervised semantic segmentation

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

Original languageEnglish
Title of host publication2019 International Conference on Computer Vision Workshops : ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea
Number of pages10
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date10.2019
Pages4481-4490
Article number9022140
ISBN (print)978-1-7281-5024-6
ISBN (electronic)978-1-7281-5023-9
DOIs
Publication statusPublished - 10.2019
Externally publishedYes
Event17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019 - Seoul, Korea, Republic of
Duration: 27.10.201928.10.2019
Conference number: 17
https://iccv2019.thecvf.com/

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

    Research areas

  • Multimodal learning, Semantic segmentation, Weakly supervised learning, Weakly supervised semantic segmentation
  • Informatics

DOI

Recently viewed

Activities

  1. From Iconography to Visual Framing: A New Approach in Visual Communication
  2. Points of cooperation: Integrating Cooperative Learning into Web-Based Courses
  3. Is there a threshold effect of time headway on subjective variables for different velocities?
  4. Modeling Grounding Processes in Chat-Based CSCL
  5. Continuous Innovation through Modular Upgradeability: How Software Upgrades Affect Consumer Product Valuations
  6. Symposium "Informatik in ihrer Vielfalt" - 2013
  7. Does the Method of Limits reveal Subjects` Capability to Determine Odor Thresholds?
  8. Sub-Plenary: Partial Organization: Perspectives, Promises and Pitfalls after a Decade of Research
  9. Curator (Zeitschrift)
  10. HyperKult XIII - Computer als Medium: Unschärfe 2004
  11. PhD Masterclass ''Discourse Theoretical Approaches to Politics, Society, Communication and Media" - 2019
  12. Things Take Their Times: Coordinating Individual and Material Eigenzeiten in Creative Work
  13. „Massive Online Virtual Communities“ 2008
  14. Demystifying diversity management: a postcolonial approach
  15. Shared mobility business models - Trust building in the Sharing Economy
  16. “When the Drugs Don’t Work” – How Paradigmatic Rigidities Constrain Innovation in the Case of Antimicrobial Resistance
  17. Zootechnologies. A Media History of Swarm Intelligence
  18. 34th EGOS Colloquium - EGOS 2018
  19. Lüneburg Workshop in Economics 2014
  20. Mobilizing and organizing for transnational solidarity: the case of Exchains
  21. 37th EGOS Colloquium - EGOS 2021
  22. Organizing Mutual Awareness in Physical and Virtual Spaces
  23. Institutionalizing the Responsible Management Logics in Sustainability Transitions: A Multi-Level Perspective
  24. Unmerkliche Materie: Multisensorische Simulationen zum Begreifen der Molekülmechanik
  25. MINT trifft BNE – 2011
  26. Learning for sustainability in the higher education sector: Challenges and ways forward