Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Challenges and Considerations in Language Model Evaluation and Benchmarking

Open Data Science via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
Watch a 31-minute technical talk where EleutherAI researcher and incoming Carnegie Mellon University PhD student Lintang Sutawika explores the intricate challenges of evaluating language models in NLP and AI. Learn about benchmarking methodologies, evaluation practices, and common assessment tasks while understanding their impact on language model research progress. Dive into advanced concepts including zero-shot capabilities, training dynamics, and multilingual extensions, with direct connections to practical machine learning applications. Examine critical topics like benchmark lifecycles, overfitting concerns, and system-level evaluation approaches, gaining practical insights for improving model comparison and evaluation methodologies. Access additional resources through the Pythia suite of open language models and EleutherAI's LM Evaluation Harness on GitHub to further explore the technical aspects of language model evaluation.

Syllabus

- Introduction
- A Key Challenge in LM Evaluation
- What do we want to evaluate?
- LM - Specific Complications
- Evaluating Models vs Systems
- Life of a Benchmark
- Overfitting
- Addressing Evaluation Pitfalls
- LM Evaluation is Challenging

Taught by

Open Data Science

Reviews

Start your review of Challenges and Considerations in Language Model Evaluation and Benchmarking

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.