Harvesting information from captions for weakly supervised semantic segmentation

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Harvesting information from captions for weakly supervised semantic segmentation. / Sawatzky, Johann; Banerjee, Debayan; Gall, Juergen.
2019 International Conference on Computer Vision Workshops: ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea. Piscataway: Institute of Electrical and Electronics Engineers Inc., 2019. p. 4481-4490 9022140 (IEEE International Conference on Computer Vision workshops; Vol. 2019).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Sawatzky, J, Banerjee, D & Gall, J 2019, Harvesting information from captions for weakly supervised semantic segmentation. in 2019 International Conference on Computer Vision Workshops: ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea., 9022140, IEEE International Conference on Computer Vision workshops, vol. 2019, Institute of Electrical and Electronics Engineers Inc., Piscataway, pp. 4481-4490, 17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019, Seoul, Korea, Republic of, 27.10.19. https://doi.org/10.1109/ICCVW.2019.00549

APA

Sawatzky, J., Banerjee, D., & Gall, J. (2019). Harvesting information from captions for weakly supervised semantic segmentation. In 2019 International Conference on Computer Vision Workshops: ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea (pp. 4481-4490). Article 9022140 (IEEE International Conference on Computer Vision workshops; Vol. 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCVW.2019.00549

Vancouver

Sawatzky J, Banerjee D, Gall J. Harvesting information from captions for weakly supervised semantic segmentation. In 2019 International Conference on Computer Vision Workshops: ICCV 2019 : proceedings : 27 October-2 November 2019, Seoul, Korea. Piscataway: Institute of Electrical and Electronics Engineers Inc. 2019. p. 4481-4490. 9022140. (IEEE International Conference on Computer Vision workshops). doi: 10.1109/ICCVW.2019.00549

Bibtex

@inbook{13c2379a3a944f5bacd91e0409b3aeca,
title = "Harvesting information from captions for weakly supervised semantic segmentation",
abstract = "Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.",
keywords = "Multimodal learning, Semantic segmentation, Weakly supervised learning, Weakly supervised semantic segmentation, Informatics",
author = "Johann Sawatzky and Debayan Banerjee and Juergen Gall",
note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019, ICCVW 2019 ; Conference date: 27-10-2019 Through 28-10-2019",
year = "2019",
month = oct,
doi = "10.1109/ICCVW.2019.00549",
language = "English",
isbn = "978-1-7281-5024-6",
series = "IEEE International Conference on Computer Vision workshops",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "4481--4490",
booktitle = "2019 International Conference on Computer Vision Workshops",
address = "United States",
url = "https://iccv2019.thecvf.com/",

}

RIS

TY - CHAP

T1 - Harvesting information from captions for weakly supervised semantic segmentation

AU - Sawatzky, Johann

AU - Banerjee, Debayan

AU - Gall, Juergen

N1 - Conference code: 17

PY - 2019/10

Y1 - 2019/10

N2 - Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

AB - Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.

KW - Multimodal learning

KW - Semantic segmentation

KW - Weakly supervised learning

KW - Weakly supervised semantic segmentation

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=85082499279&partnerID=8YFLogxK

U2 - 10.1109/ICCVW.2019.00549

DO - 10.1109/ICCVW.2019.00549

M3 - Article in conference proceedings

AN - SCOPUS:85082499279

SN - 978-1-7281-5024-6

T3 - IEEE International Conference on Computer Vision workshops

SP - 4481

EP - 4490

BT - 2019 International Conference on Computer Vision Workshops

PB - Institute of Electrical and Electronics Engineers Inc.

CY - Piscataway

T2 - 17th IEEE/CVF International Conference on Computer Vision Workshop - ICCVW 2019

Y2 - 27 October 2019 through 28 October 2019

ER -

DOI

Recently viewed

Publications

  1. Speed of processing and stimulus complexity in low-frequency and high-frequency channels
  2. Support vector machines with example dependent costs
  3. Mathematical Modeling for Robot 3D Laser Scanning in Complete Darkness Environments to Advance Pipeline Inspection
  4. Trajectory tracking using MPC and a velocity observer for flat actuator systems in automotive applications
  5. Interactions between ecosystem properties and land use clarify spatial strategies to optimize trade-offs between agriculture and species conservation
  6. Multiphase-field modeling of temperature-driven intermetallic compound evolution in an Al-Mg system for application to solid-state joining processes
  7. HAWK - hybrid question answering using linked data
  8. Erroneous examples as desirable difficulty
  9. Student Game Design for Language Learning
  10. The identification of up-And downstream industries using input-output tables and a firm-level application to minority shareholdings
  11. Automatic three-dimensional geometry and mesh generation of periodic representative volume elements for matrix-inclusion composites
  12. Head turn scaling below the threshold of perception in immersive virtual environments
  13. Implicit and explicit horizons
  14. Cue predictability changes scaling in eye-movement fluctuations
  15. Quantum computing
  16. How alloying and processing effects can influence the microstructure and mechanical properties of directly extruded thin zinc wires
  17. Constructing strangeness
  18. Public perceptions of CCS in context
  19. Cobalt in end-of-life products in the EU, where does it end up? - The MaTrace approach
  20. Systematic feature evaluation for gene name recognition
  21. Digital Business Transformation and the Changing Role of the IT Function
  22. The buffering effect of selection, optimization, and compensation strategy use on the relationship between problem solving demands and occupational well-being
  23. (How) Can didactic research find its way into the classroom? Results from a questionnaire survey on the lesson preparation and continuing professional development of German teachers
  24. Adaptive control of the nonlinear dynamic behavior of the cantilever-sample system of an atomic force microscope
  25. Non-acceptances in context
  26. Toward Data-Driven Analyses of Electronic Text Books
  27. The Benefit of Web- and Computer-Based Interventions for Stress
  28. Knowledge Decolonization à la Grounded Theory
  29. Root-root interactions: extending our perspective to be more inclusive of the range of theories in ecology and agriculture using in-vivo analyses
  30. Modelling, explaining, enacting and getting feedback: How can the acquisition of core practices in teacher education be optimally fostered?
  31. What factors enable social-ecological transformative potential? The role of learning practices, empowerment, and networking
  32. Experience from downscaling IPCC-SRES scenarios to specific national-level focus scenarios for ecosystem service management