This video explores visual reasoning capabilities in AI systems, examining both the latest research algorithms and real-world applications. Dive into an analysis of "Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning," a paper published by researchers from Peking University, Beijing Academy of Artificial Intelligence, Chinese Academy of Sciences, and University of Chinese Academy of Sciences. Learn about the current limitations of visual reasoning in Vision Language Models (VLMs) through personal experiences with commercial AI systems. The 22-minute presentation provides insights into the gap between research claims and practical performance of visual AI reasoning technologies.
Overview
Syllabus
Failure of AI "Visual Reasoning" in VLMs
Taught by
Discover AI