End-to-End Active Speaker Detection

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

  • Juan León Alcázar
  • Moritz Cordes
  • Chen Zhao
  • Bernard Ghanem

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation. In this paper, we propose an end-to-end ASD workflow where feature learning and contextual predictions are jointly learned. Our end-to-end trainable network simultaneously learns multi-modal embeddings and aggregates spatio-temporal context. This results in more suitable feature representations and improved performance in the ASD task. We also introduce interleaved graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem. Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance. Finally, we design a weakly-supervised strategy, which demonstrates that the ASD problem can also be approached by utilizing audiovisual data but relying exclusively on audio annotations. We achieve this by modelling the direct relationship between the audio signal and the possible sound sources (speakers), as well as introducing a contrastive loss.

OriginalspracheEnglisch
TitelComputer Vision – ECCV 2022 - 17th European Conference, Proceedings
HerausgeberShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Anzahl der Seiten18
VerlagSpringer Science and Business Media Deutschland
Erscheinungsdatum2022
Seiten126-143
ISBN (Print)978-3-031-19835-9
ISBN (elektronisch)978-3-031-19836-6
DOIs
PublikationsstatusErschienen - 2022
VeranstaltungConference - 17th European Conference on Computer Vision - ECCV 2022 - Expo Tel Aviv / David Intercontinental Hotel, Tel Aviv, Israel
Dauer: 23.10.202227.10.2022
Konferenznummer: 17
https://eccv2022.ecva.net/

Bibliographische Notiz

Funding Information:
Acknowledgements. This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research through the Visual Computing Center (VCC) funding.

Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

DOI

Zuletzt angesehen

Forschende

  1. Christian Rudeloff

Publikationen

  1. Technology-centred learning processes as digital artistic development
  2. Who’s afraid of the senses? Organization, management and the return of the sensorium
  3. Assessing ecosystem function of restoration plantings in south-eastern Australia
  4. Feature selection for density level-sets
  5. Self-regulation of priming effects on behavior
  6. Prolog und relationale Datenbanken als Grundlagen zur Implementierung einer NF2-Datenbank
  7. Compression behaviour of wire + arc additive manufactured structures
  8. Papers from the 10th Lancaster University Postgraduate Conference in Linguistics and Language Teaching 2015
  9. Creating uncertainty in the governance of arrival and return: target-group constructions in Bavarian AnkER facilities
  10. Energy transitions in small-scale regions – What we can learn from a regional innovation systems perspective.
  11. Evaluation of revitalization policies and redevelopment strategy for residential environment in coal mining areas
  12. Pathways for Transformatio
  13. An image morphing method for 3D reconstruction and FE-analysis of pore networks in thermal spray coatings
  14. Entry, Exit and Productivity
  15. Microstructure-Oriented Fatigue Crack Propagation in Two Cast Mg–Al–Ba–Ca Alloys
  16. Promoting recovery in daily life
  17. The effect of neighbor species' phylogenetic and trait difference on tree growth in subtropical forests
  18. Die Schreibwerkstatt Mehrsprachigkeit
  19. Concatenated Commons and Operational Aesthetics
  20. Global Theories of Regionalism
  21. Can cross-group contact predict advantaged group member’s willingness to engage in costly solidarity-based actions? Yes, if the contact is politicized
  22. A Framework for Ecopreneurship
  23. Participation as a Mode of Conflict
  24. Emergency Politics After Globalization
  25. Knowledge sharing for shared success in the decade on ecosystem restoration
  26. Motivation and emotion as mediators in multimedia learning
  27. Die Schreibwerkstatt Mehrsprachigkeit
  28. From Estimation Results to Stylized Facts
  29. E-Mail versus Face-to-face
  30. Fast Catch Bumerang
  31. Zur ‚Privatisierung‘ von gewaltsamem Protest
  32. Marktdesign für eine effiziente Netzanbindung von Offshore-Windenergie
  33. Volunteering in retirement motivation and design of use of Senior Expert Services
  34. Flüssige Technokratie
  35. Pay What You Want
  36. Neue Gaspipelines und Flüssiggas-Terminals sind in Europa überflüssig
  37. Evaluation von Unterrichtsstandards

Presse / Medien

  1. Performance Saga