blog 7
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
- Uni3D: Exploring Unified 3D Representation at Scale
- ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
- Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
- Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
- LLaVA: Large Language and Vision Assistant
- EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning