End-to-End Active Speaker Detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Juan León Alcázar
  • Moritz Cordes
  • Chen Zhao
  • Bernard Ghanem

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation. In this paper, we propose an end-to-end ASD workflow where feature learning and contextual predictions are jointly learned. Our end-to-end trainable network simultaneously learns multi-modal embeddings and aggregates spatio-temporal context. This results in more suitable feature representations and improved performance in the ASD task. We also introduce interleaved graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem. Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance. Finally, we design a weakly-supervised strategy, which demonstrates that the ASD problem can also be approached by utilizing audiovisual data but relying exclusively on audio annotations. We achieve this by modelling the direct relationship between the audio signal and the possible sound sources (speakers), as well as introducing a contrastive loss.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022 - 17th European Conference, Proceedings
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Number of pages18
PublisherSpringer Science and Business Media Deutschland
Publication date2022
Pages126-143
ISBN (print)978-3-031-19835-9
ISBN (electronic)978-3-031-19836-6
DOIs
Publication statusPublished - 2022
EventConference - 17th European Conference on Computer Vision - ECCV 2022 - Expo Tel Aviv / David Intercontinental Hotel, Tel Aviv, Israel
Duration: 23.10.202227.10.2022
Conference number: 17
https://eccv2022.ecva.net/

Bibliographical note

Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Recently viewed

Publications

  1. Leitbildimplementierung in Organisationen
  2. Pathways of Data-driven Business Model Design and Realization
  3. Learner characteristics and information processing in multimedia learning
  4. Development and prospects of degradable magnesium alloys for structural and functional applications in the fields of environment and energy
  5. Umweltverschmutzung durch Licht
  6. Evolutionary clustering of Lagrangian trajectories in turbulent Rayleigh-Bénard convection flows
  7. Daily breath-based mindfulness exercises in a randomized controlled trial improve primary school children’s performance in arithmetic
  8. Vocational exploration: Multivariate predictors and effect on confidence development in adolescence
  9. States and traits
  10. Governance statt Management oder: Management der Governance
  11. Framework, Drivers and Information Needs for Creating Business Cases for Sustainability
  12. Influence of kinetic effects on the spectrum of a parallel electrode probe
  13. Increasing skepticism toward potential liars
  14. What Provides Justification for Cheating-Producing or Observing Counterfactuals?
  15. Milchbubirechnung
  16. Programme des Lebens und Überlebens
  17. Diagrammieren/diagrammatische Praxis
  18. Lernprotokollunterstütztes Lernen - ein Vergleich zwischen unstrukturiertem und systemkontrolliertem diskursivem Lernen im Netz
  19. Geometrical Accuracy in Two-Stage Incremental Sheet Forming with Active Medium
  20. New methods for the analysis of links between international firm activities and firm performance
  21. User experience predicts the effectiveness of a gamified recovery app
  22. Schätzen – Aber wie?
  23. Glaspraxis und Umwelttheorie
  24. Participation for effective environmental governance? Evidence from Water Framework Directive implementation in Germany, Spain and the United Kingdom
  25. Dimension theory of linear solenoids