multimodal-understanding

Here are 4 public repositories matching this topic...

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.

reinforcement-learning reasoning vlm llm multimodal-understanding deepseek-r1 grpo vlm-r1 multimodal-r1 r1v skywork-r1v

[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

[CVPR 2025] 🔥 Official impl. of "Audio-Visual Instance Segmentation".

Sample project of multimodal decision and image generation with DeepSeek Janus Pro 7B with Real-ESRGAN upscaling

Add a description, image, and links to the multimodal-understanding topic page so that developers can more easily learn about it.

To associate your repository with the multimodal-understanding topic, visit your repo's landing page and select "manage topics."