Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Multi DeepSeek R1: Learning to Reason with Multimodal Large Language Models via Step-wise GRPO

Discover AI via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
This video explores groundbreaking AI research on R1 multi-modal reasoning, demonstrating how StepGRPO's step-wise rewards create more reliable, structured, and logically sound reasoning in multimodal large language models. Learn how continuous and detailed feedback on both accuracy and validity enables incremental improvements beyond passive supervised imitation, resulting in superior performance across multiple reasoning benchmarks. The presentation covers the research paper "R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization" by researchers from Nanyang Technological University and Tsinghua University, offering valuable insights into the advancement of multimodal reasoning capabilities in AI systems.

Syllabus

Multi DeepSeek R1: STEP-GRPO RL MultiModal

Taught by

Discover AI

Reviews

Start your review of Multi DeepSeek R1: Learning to Reason with Multimodal Large Language Models via Step-wise GRPO

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.