Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

You Get an LLM, Everyone Gets an LLM, But Does It Work? - Evaluating LLM Performance

Conf42 via YouTube

Overview

Explore the intricacies of evaluating Large Language Models (LLMs) in this conference talk from Conf42 LLMs 2024. Delve into the characteristics of effective evaluation frameworks, comparing public benchmarks with golden datasets. Understand why well-defined use cases are crucial for LLM assessment and examine traditional metrics alongside innovative approaches like LLM-based evaluations. Learn about closing performance gaps, available evaluation frameworks, and the importance of creating custom test sets. Gain insights into the challenges and best practices for determining LLM effectiveness in real-world applications.

Syllabus

intro
preamble
evaluations
what makes a good evaluation framework?
public benchmark vs golden datasets
your use case is likely well defined
good ol' metrics
llm evaluates llm
metrics evaluate llm
closing the gap
available frameworks
all you need is your own test/eval set
thank you!

Taught by

Conf42

Reviews

Start your review of You Get an LLM, Everyone Gets an LLM, But Does It Work? - Evaluating LLM Performance

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.