End-to-End Active Speaker Detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Juan León Alcázar
  • Moritz Cordes
  • Chen Zhao
  • Bernard Ghanem

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation. In this paper, we propose an end-to-end ASD workflow where feature learning and contextual predictions are jointly learned. Our end-to-end trainable network simultaneously learns multi-modal embeddings and aggregates spatio-temporal context. This results in more suitable feature representations and improved performance in the ASD task. We also introduce interleaved graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem. Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance. Finally, we design a weakly-supervised strategy, which demonstrates that the ASD problem can also be approached by utilizing audiovisual data but relying exclusively on audio annotations. We achieve this by modelling the direct relationship between the audio signal and the possible sound sources (speakers), as well as introducing a contrastive loss.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022 - 17th European Conference, Proceedings
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Number of pages18
PublisherSpringer Science and Business Media Deutschland
Publication date2022
Pages126-143
ISBN (print)978-3-031-19835-9
ISBN (electronic)978-3-031-19836-6
DOIs
Publication statusPublished - 2022
EventConference - 17th European Conference on Computer Vision - ECCV 2022 - Expo Tel Aviv / David Intercontinental Hotel, Tel Aviv, Israel
Duration: 23.10.202227.10.2022
Conference number: 17
https://eccv2022.ecva.net/

Bibliographical note

Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Recently viewed

Researchers

  1. Timo Janca

Publications

  1. Fermentative utilization of coffee mucilage using Bacillus coagulans and investigation of down-stream processing of fermentation broth for optically pure L(+)-lactic acid production
  2. Planar Multipole Resonance Probe: A kinetic model based on a functional analytic description
  3. Timing matters: Distinct effects of nitrogen and phosphorus fertilizer application timing on root system architecture responses
  4. Dealing with inclusion–teachers’ assessment of internal and external resources
  5. Reconfigurable Control System for Plants with Variable Structure
  6. Possible underestimations of risks for the environment due to unregulated emissions of biocides from households to wastewater
  7. Reconciling conservation and development in protected areas of the Global South
  8. Modelling scenarios to identify a combined sediment-water management strategy for the large reservoirs of the Tuyamuyun hydro-complex
  9. Identifying determinants of teachers' judgment (in)accuracy regarding students' school-related motivations using a Bayesian cross-classified multi-level model
  10. Design rules for environmental biodegradability of phenylalanine alkyl ester linked ionic liquids
  11. Empirical research on mathematical modelling
  12. Assessing the costs and cost-effectiveness of ICare internet-based interventions (protocol)
  13. Higher Wages in Exporting Firms
  14. From GUI to No-UI
  15. Model of mobility demands for future short distance public transport systems
  16. Material utilization of organic residues
  17. Demarcating transdisciplinary research in sustainability science—Five clusters of research modes based on evidence from 59 research projects
  18. Brennball
  19. Energy-aware system design for autonomous wireless sensor nodes
  20. Third International Mathematics and Science Study and Trends in Mathematics and Science Studies (TIMSS)
  21. Development of a magnesium secondary alloy system for mixed magnesium post-consumer scrap
  22. Physicochemical properties and biodegradability of organically functionalized colloidal silica particles in aqueous environment
  23. Planning for Sea Spaces I: Processes, Practices and Future Perspectives
  24. The blue-collar brain
  25. Utilization of organic residues using heterotrophic microalgae and insects
  26. Beschreibung der Hauptergebnisse
  27. Technik – Magie – Medium
  28. Culture as an Engine of Local Development Processes
  29. Modeling of Friction-Induced Vibrations during Tightening of Bolted Joints
  30. Corrosion behaviour of as-cast ZK40 with CaO and Y additions
  31. Video Game Microtransactions & Loot Boxes - An Empirical Study on the Effectiveness of Social Responsibility Measures
  32. From Volatile Maintenance Data Forecasting to Reliable Capacity Planning
  33. Provisions for nullification of conservation and management measures in RFMO objection procedures
  34. Social and dimensional comparison effects on academic self-concepts and self-perceptions of effort in elementary school children
  35. Multimodal analysis of spatially heterogeneous microstructural refinement and softening mechanisms in three-pass friction stir processed Al-4Si alloy
  36. Characterization of the microstructure evolution in IF-Steel and AA6016 during plane-strain tension and simple shear