End-to-End Active Speaker Detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Juan León Alcázar
  • Moritz Cordes
  • Chen Zhao
  • Bernard Ghanem

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation. In this paper, we propose an end-to-end ASD workflow where feature learning and contextual predictions are jointly learned. Our end-to-end trainable network simultaneously learns multi-modal embeddings and aggregates spatio-temporal context. This results in more suitable feature representations and improved performance in the ASD task. We also introduce interleaved graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem. Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance. Finally, we design a weakly-supervised strategy, which demonstrates that the ASD problem can also be approached by utilizing audiovisual data but relying exclusively on audio annotations. We achieve this by modelling the direct relationship between the audio signal and the possible sound sources (speakers), as well as introducing a contrastive loss.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022 - 17th European Conference, Proceedings
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Number of pages18
PublisherSpringer Science and Business Media Deutschland
Publication date2022
Pages126-143
ISBN (print)978-3-031-19835-9
ISBN (electronic)978-3-031-19836-6
DOIs
Publication statusPublished - 2022
EventConference - 17th European Conference on Computer Vision - ECCV 2022 - Expo Tel Aviv / David Intercontinental Hotel, Tel Aviv, Israel
Duration: 23.10.202227.10.2022
Conference number: 17
https://eccv2022.ecva.net/

Bibliographical note

Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Recently viewed

Publications

  1. Abjection and Formlessness
  2. Exploring the efficacy of metabarcoding and non-target screening for detecting treated wastewater
  3. Finding the Best Match — a Case Study on the (Text‑) Feature and Model Choice in Digital Mental Health Interventions
  4. Editorial: Courts in Context. An Empirical Re-Evaluation of Categorization in the Asylum Regime
  5. Challenging the status quo of accelerator research: Concluding remarks
  6. Assessing the Bonding Interface Characteristics and Mechanical Properties of Bobbin Tool Friction Stir Welded Dissimilar Aluminum Alloy Joints
  7. Whose Change is it, Anyway?
  8. Kinetic modeling of active plasma resonance spectroscopy
  9. Is the EnodePro® a Valid Tool to Determine the Bar Velocity in the Bench Press and Barbell Back Squat? A Comparative Analysis
  10. Quantitative Bildtypenanalyse
  11. Daily deep acting toward coworkers
  12. Reprint of: Drivers of within-tree leaf trait variation in a tropical planted forest varying in tree species richness
  13. Shedding light on trophic interactions
  14. Die Elbtalaue
  15. Rethinking AI
  16. Monster und Freaks
  17. Nonprofit-Organizations and Society
  18. Finite-time entropy
  19. Identifying governance gaps among interlinked sustainability challenges
  20. Segment profitability in the leisure industry
  21. Fünfzig Jahre nach morgenwo werden wir sein?
  22. Making education for sustainable development happen in elementary schools
  23. Lekcja 29-30
  24. Comparative Regionalism beyond Europe versus the Rest
  25. Ageing and entrepreneurship
  26. Mythos als Aufklärung
  27. Effect of Alloying with Rare-Earth Metals on the Degradation of Magnesium Alloys Studied Using a Combination of Isothermal Calorimetry and Pressure Measurements
  28. CO2-Steuer oder Ausweitung des Emissionshandels
  29. Relationen im Raum
  30. Musik & Marken
  31. Contextualising coastal management and adaptation
  32. The impact of nuclear accidents on provisioning ecosystem services