Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Evaluating Language Models for Mathematics Through Interactive Problem-Solving

Harvard CMSA via YouTube

Overview

Watch a Harvard CMSA seminar presentation where Katherine Collins and Albert Jiang from the University of Cambridge discuss their research on evaluating large language models (LLMs) for mathematical problem-solving through interactive assessment. Explore the development of CheckMate, a prototype platform designed to facilitate human-LLM interactions and evaluation in mathematical contexts. Learn about their comparative study of InstructGPT, ChatGPT, and GPT-4 as mathematical proof assistants, involving participants ranging from undergraduate students to mathematics professors. Discover key insights from their MathConverse dataset, including a taxonomy of human behaviors and the relationship between correctness and perceived helpfulness in LLM responses. Gain valuable perspectives on the practical applications and limitations of LLMs in mathematical reasoning, with particular attention to GPT-4's capabilities as analyzed through expert mathematician case studies. Understand important considerations for both machine learning practitioners and mathematicians, including the benefits of models that effectively communicate uncertainty, respond to corrections, and maintain interpretability and conciseness.

Syllabus

Katherine Collins & Albert Jiang | Evaluating Language Models for Mathematics through Interactions

Taught by

Harvard CMSA

Reviews

Start your review of Evaluating Language Models for Mathematics Through Interactive Problem-Solving

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.