Hands in Focus: Sign Language Recognition Via Top-Down Attention
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Standard
2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings: Proceedings. Piscataway: IEEE Electromagnetic Compatibility Society, 2023. S. 2555-2559 (Proceedings - International Conference on Image Processing, ICIP).
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Hands in Focus: Sign Language Recognition Via Top-Down Attention
AU - Sarhan, Noha
AU - Wilms, Christian
AU - Closius, Vanessa
AU - Brefeld, Ulf
AU - Frintrop, Simone
N1 - Conference code: 30
PY - 2023/10/8
Y1 - 2023/10/8
N2 - In this paper, we propose a novel Sign Language Recognition (SLR) model that leverages the task-specific knowledge to incorporate Top-Down (TD) attention to focus the processing of the network on the most relevant parts of the input video sequence. For SLR, this includes information about the hands' shape, orientation and positions, and motion trajectory. Our model consists of three streams that process RGB, optical flow and TD attention data. For the TD attention, we generate pixel-precise attention maps focusing on both hands, thereby retaining valuable hand information, while eliminating distracting background information. Our proposed method outperforms state-of-the-art on a challenging large-scale dataset by over 2%, and achieves strong results with a much simpler architecture compared to other systems on the newly released AUTSL dataset [1].
AB - In this paper, we propose a novel Sign Language Recognition (SLR) model that leverages the task-specific knowledge to incorporate Top-Down (TD) attention to focus the processing of the network on the most relevant parts of the input video sequence. For SLR, this includes information about the hands' shape, orientation and positions, and motion trajectory. Our model consists of three streams that process RGB, optical flow and TD attention data. For the TD attention, we generate pixel-precise attention maps focusing on both hands, thereby retaining valuable hand information, while eliminating distracting background information. Our proposed method outperforms state-of-the-art on a challenging large-scale dataset by over 2%, and achieves strong results with a much simpler architecture compared to other systems on the newly released AUTSL dataset [1].
KW - Informatics
KW - sign language recognition
KW - top-down attention
KW - deep learning
UR - http://www.scopus.com/inward/record.url?scp=85180742060&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/6fa5f221-4e0f-376d-967b-385f6ae998c5/
U2 - 10.1109/icip49359.2023.10222729
DO - 10.1109/icip49359.2023.10222729
M3 - Article in conference proceedings
SN - 978-1-7281-9836-1
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 2555
EP - 2559
BT - 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
PB - IEEE Electromagnetic Compatibility Society
CY - Piscataway
T2 - 2023 IEEE International Conference on Image Processing
Y2 - 8 October 2023 through 11 October 2023
ER -