mllm 6
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
- Lecture 14: Reasoning
- Lecture 11: High-Resolution, High-Performing LVLMs
- Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
- LLaVA: Large Language and Vision Assistant
- EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning