
Overview

Syllabus
00:00 Introduction to Vision Language Models
00:55 Model Recommendations: Small vs Large
02:02 Exploring Moondream's Latest Features
03:00 Inference with Moondream
12:20 Fine-Tuning SmolVLM
12:55 Understanding SmolVLM Architecture
17:22 Fine-Tuning SmolVLM: Step-by-Step
32:54 Introducing Qwen 2.5 VL
37:48 Troubleshooting FlashAttention Installation
38:42 Updating Transformers and Restarting Kernel
39:50 Handling Token Limits and VRAM Issues
40:44 Evaluating Model Performance on Chess Pieces
42:48 Comparing Performance with Florence 2
44:46 Training Loop and Data Collator Setup
50:34 Addressing Memory Issues and Image Resolution
55:39 Final Training and Evaluation
01:04:22 Inference and Model Comparison
01:08:27 Conclusion and WebGPU Demo
Taught by
Trelis Research