End-to-End Active Speaker Detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Juan León Alcázar
  • Moritz Cordes
  • Chen Zhao
  • Bernard Ghanem

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation. In this paper, we propose an end-to-end ASD workflow where feature learning and contextual predictions are jointly learned. Our end-to-end trainable network simultaneously learns multi-modal embeddings and aggregates spatio-temporal context. This results in more suitable feature representations and improved performance in the ASD task. We also introduce interleaved graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem. Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance. Finally, we design a weakly-supervised strategy, which demonstrates that the ASD problem can also be approached by utilizing audiovisual data but relying exclusively on audio annotations. We achieve this by modelling the direct relationship between the audio signal and the possible sound sources (speakers), as well as introducing a contrastive loss.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022 - 17th European Conference, Proceedings
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Number of pages18
PublisherSpringer Science and Business Media Deutschland
Publication date2022
Pages126-143
ISBN (print)978-3-031-19835-9
ISBN (electronic)978-3-031-19836-6
DOIs
Publication statusPublished - 2022
EventConference - 17th European Conference on Computer Vision - ECCV 2022 - Expo Tel Aviv / David Intercontinental Hotel, Tel Aviv, Israel
Duration: 23.10.202227.10.2022
Conference number: 17
https://eccv2022.ecva.net/

Bibliographical note

Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Recently viewed

Activities

  1. Co-creating transformative processes - a designerly approach
  2. Understanding Corruption by Means of Experiments
  3. Verification of Measuring the Bearing Clearance Using Kurtosis, Recurrences and Neural Networks and Comparison of These Approaches
  4. Commitment Strategies for Sustainability: How Corporations Can Create Value through New Governance
  5. Parks of the Future – Protected Areas in Europe Challenging Issues of Societal Transformation
  6. Correlation Patterns of PAHs and Heterocyclic PAHs in Sediment Samples from Northern Germany - Point Sources and Diffuse Immissions
  7. Affective Human-Robot Interaction – The Influence of Humans’ Emotion Recognition Ability
  8. Modern Language Journal: devoted to research and discussion about the learning and teaching of foreign and second languages (Zeitschrift)
  9. Why Being Democratic is Just Not Enough: The EU’s Governance Transfer
  10. Current Developments in Environmental Management Accounting: Towards a Comprehensive Framework for Environmental Management Accounting
  11. Challenges and Possibilities of Digitization During the Pandemic: The Cuban Case and Questions of Access
  12. Lodz University of Technology
  13. Evaluation of tension-compression asymmetry in nanocrystalline PdAu using a Drucker-Prager type constitutive model.
  14. The view of the young generation on (E)SD in Germany
  15. From e-learning to the acquirement of competencies: wiki-based knowledge management and complex problem solving
  16. Empirical Research Methods on Legitimacy: Repertory Grid as the Interface between „Measuring“ and „Evaluating“
  17. Lecture and Workshop on media-based presentations and music-marketing

Publications

  1. Development and criterion validity of differentiated and elevated vocational interests in adolescence
  2. Case study: The development of a multi-material heat sink by Additive Manufacturing using Aerosint technology
  3. Multi-Professional Support
  4. The Pervasive Power of PowerPoint
  5. Use of design methods, team leaders' goal orientation, and team effectiveness: A follow-up study in software development projects
  6. Applying the Rasch sampler to identify aberrant responding through person fit statistics under fixed nominal α-level.
  7. Baseflow recession and recharge as nonlinear storage processes
  8. Formative assessment in inclusive mathematics education in secondary schools
  9. Glancing into the Applied Tool Box
  10. Comparison of EKF and TSO for Health Monitoring of a Textile-Based Heater Structure and its Control
  11. Conceptions of problem solving mathematics teaching
  12. Ideas, Complexity, and Innovation
  13. Logistical Potentials of Load Balancing via the Build-up and Reduction of Stock
  14. A direct test of the similarity assumption — Focusing on differences as compared with similarities decreases automatic imitation
  15. Plutonium Worlds
  16. Nmap: A novel neighborhood preservation space-filling algorithm
  17. Science-Related Outcomes
  18. Modernizing persistence–bioaccumulation–toxicity (PBT) assessment with high throughput animal-free methods
  19. Priority effects of time of arrival of plant functional groups override sowing interval or density effects
  20. The representative turn in EU studies
  21. Negotiating boundaries through reality shows
  22. Discrete Lyapunov Controllers for an Actuator in Camless Engines