Research
Published work in multimodal machine learning, video understanding, and NLP.
CIKM 2025
When Words Can't Capture It All: Video-Based User Complaint Generation
Defined CoD-V, a novel multimodal task for automated technical support reporting. Fine-tuned VideoLLama2 with MultiModal RAG on the curated ComVID benchmark (1,176 annotated videos).
EMNLP 2024
M3Hop-CoT: Misogynous Meme Identification
Proposed a Chain-of-Thought framework for detecting multimodal hate speech. Three-hop prompting strategy (Emotion, Target, Context) significantly outperformed unimodal baselines.