Multi DeepSeek R1: Learning to Reason with Multimodal Large Language Models via Step-wise GRPO
Discover AI via YouTube
Overview
This video explores groundbreaking AI research on R1 multi-modal reasoning, demonstrating how StepGRPO's step-wise rewards create more reliable, structured, and logically sound reasoning in multimodal large language models. Learn how continuous and detailed feedback on both accuracy and validity enables incremental improvements beyond passive supervised imitation, resulting in superior performance across multiple reasoning benchmarks. The presentation covers the research paper "R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization" by researchers from Nanyang Technological University and Tsinghua University, offering valuable insights into the advancement of multimodal reasoning capabilities in AI systems.
Syllabus
Multi DeepSeek R1: STEP-GRPO RL MultiModal
Taught by
Discover AI