Skip to content
/ KARMMA Public

Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities (ICRA 2026)

License

Notifications You must be signed in to change notification settings

visinf/KARMMA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities

Maria Santos-Villafranca* 1 Dustin Carrión-Ojeda* 2,3 Alejandro Perez-Yus1 Jesus Bermudez-Cameo1 Jose J. Guerrero1 Simone Schaub-Meyer2,3

1University of Zaragoza 2TU Darmstadt 3hessian.AI *equal contribution

Paper arXiv Project Page URL


🚧 The code will be released soon. Stay tuned!

🚀 About

This repository contains the official implementation of the paper Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities.

Teaser

Existing methods for egocentric action recognition often rely solely on RGB videos, although additional modalities, e.g., audio, can improve accuracy in challenging scenarios. However, most multimodal approaches assume all modalities are available at inference, leading to significant accuracy drops, or even failure, when inputs are missing. To address this, we introduce KARMMA, a multimodal Knowledge distillation framework for egocentric Action Recognition robust to Missing ModAlities that requires no modality alignment across all samples during training or inference. KARMMA distills knowledge from a multimodal teacher into a multimodal student that benefits from all available modalities while remaining robust to missing ones, making it suitable for diverse scenarios without retraining. Our student uses approximately 50% fewer computational resources than our teacher, resulting in a lightweight and fast model. Experiments on Epic-Kitchens and Something-Something show that our student achieves competitive accuracy while significantly reducing accuracy drops under missing modality conditions.

About

Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities (ICRA 2026)

Topics

Resources

License

Stars

Watchers

Forks

Contributors