Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Build Custom LLM Benchmarks for Your Application

Trelis Research via YouTube

Overview

Udemy Special: Ends May 28!
Learn Data Science. Courses starting at $12.99.
Get Deal
Learn how to create and implement custom benchmarks for evaluating large language models (LLMs) specific to your application needs in this 47-minute tutorial from Trelis Research. Explore YourBench from HuggingFace for quick-start benchmarking, discover techniques for running benchmarks locally, and master advanced data generation concepts including PDF conversion, difficulty estimation, citations, chunking, multi-hop reasoning, and filtering. The tutorial covers evaluating custom datasets using LightEval and demonstrates comprehensive evaluation and data inspection techniques with Trelis ADVANCED-evals tools. Access the repository at Trelis.com/ADVANCED-evals to follow along with practical examples and implementation guidance.

Syllabus

0:00 Creating a custom benchmarking dataset
0:31 Video Overview and Scripts https://trelis.com/ADVANCED-evals
1:06 Quick-start with YourBench from HuggingFace
7:47 Running YourBench locally to create a benchmark
20:59 Advanced data generation notes pdf conversion, estimating difficulty, citations, chunking, multi-hop, filtering
29:23 Evaluating a custom dataset using LightEval
36:29 Evaluation and Data Inspection with Trelis ADVANCED-evals
46:01 Conclusion

Taught by

Trelis Research

Reviews

Start your review of Build Custom LLM Benchmarks for Your Application

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.