Research

Published work in multimodal machine learning, video understanding, and NLP.

CIKM 2025

When Words Can't Capture It All: Video-Based User Complaint Generation

Defined CoD-V, a novel multimodal task for automated technical support reporting. Fine-tuned VideoLLama2 with MultiModal RAG on the curated ComVID benchmark (1,176 annotated videos).

EMNLP 2024

M3Hop-CoT: Misogynous Meme Identification

Proposed a Chain-of-Thought framework for detecting multimodal hate speech. Three-hop prompting strategy (Emotion, Target, Context) significantly outperformed unimodal baselines.