Overview
This tutorial guides you through fine-tuning a ConvNeXT vision transformer model for custom dog breed classification using Hugging Face Transformers and PyTorch. Learn the complete workflow from loading and preprocessing custom image datasets with datasets and torchvision to implementing training loops with validation and early stopping. Master essential techniques including transforming images with AutoImageProcessor for optimal ConvNeXT performance, fine-tuning pre-trained models on new datasets, saving and loading models for inference, and making predictions with fine-tuned Vision Transformer models. The 32-minute video includes practical chapters covering installation, dataset exploration, data transformation and loading, model building and training, and model testing. Access the complete code for the tutorial through the provided link and explore additional computer vision and visual language model tutorials in the creator's playlists.
Syllabus
00:00 Introduction
01:29 Installation
04:13 Discover the dataset
05:34 Transform and load the data
17:37 Build and train the vision transformer model
26:53 Test the model
Taught by
Eran Feit