
Overview

Udemy Special: Ends May 28!
Learn Data Science. Courses starting at $12.99.
Get Deal
Learn how to create and implement custom benchmarks for evaluating large language models (LLMs) specific to your application needs in this 47-minute tutorial from Trelis Research. Explore YourBench from HuggingFace for quick-start benchmarking, discover techniques for running benchmarks locally, and master advanced data generation concepts including PDF conversion, difficulty estimation, citations, chunking, multi-hop reasoning, and filtering. The tutorial covers evaluating custom datasets using LightEval and demonstrates comprehensive evaluation and data inspection techniques with Trelis ADVANCED-evals tools. Access the repository at Trelis.com/ADVANCED-evals to follow along with practical examples and implementation guidance.
Syllabus
0:00 Creating a custom benchmarking dataset
0:31 Video Overview and Scripts https://trelis.com/ADVANCED-evals
1:06 Quick-start with YourBench from HuggingFace
7:47 Running YourBench locally to create a benchmark
20:59 Advanced data generation notes pdf conversion, estimating difficulty, citations, chunking, multi-hop, filtering
29:23 Evaluating a custom dataset using LightEval
36:29 Evaluation and Data Inspection with Trelis ADVANCED-evals
46:01 Conclusion
Taught by
Trelis Research