Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Image Classification Using Vision Transformer (ViT) with Your Custom Dataset - Full Tutorial

Eran Feit via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
This comprehensive tutorial video guides you through building a Vision Transformer (ViT) from scratch using PyTorch for image classification with custom datasets. Learn the complete workflow from data preparation to model deployment, including loading and transforming image datasets, creating patch embeddings, implementing Multi-Head Self-Attention mechanisms, building a Transformer Encoder for image processing, training and optimizing your ViT model, and testing with real data. Follow along with step-by-step instructions covering installation, dataset exploration, image patch creation, model building, training, and prediction. The 44-minute tutorial includes timestamps for easy navigation through each section, from introduction to final model testing. Access the complete code through the provided link and explore additional computer vision and visual language model tutorials from the creator's playlists and blog.

Syllabus

00:00 Introduction
00:55 Installation
04:15 Discover the dataset
06:46 How to load the dataset
15:46 How to split images to patches
30:40 Build and train VIT model
46:10 Test the model Prediction

Taught by

Eran Feit

Reviews

Start your review of Image Classification Using Vision Transformer (ViT) with Your Custom Dataset - Full Tutorial

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.