
Benchmarking Language Model Creativity: A Case Study on Code Generation
Center for Language & Speech Processing(CLSP), JHU via YouTube
Overview

Udemy Special: Ends May 28!
Learn Data Science. Courses starting at $12.99.
Get Deal
Explore a 15-minute conference talk from NAACL 2025 presented by Yining Lu from the Center for Language & Speech Processing at JHU, examining how to measure creativity in language models through code generation. Learn about the innovative framework that introduces "denial prompting" - a technique that forces LLMs to develop increasingly creative solutions by adding new constraints to previous solutions. Understand "NEOGAUGE," a metric designed to quantify both convergent thinking (goal-oriented problem solving) and divergent thinking (adaptability to new constraints) in LLM responses. Discover findings from experiments with Codeforces problems that reveal even advanced models like GPT-4 still lack human-level creativity in coding tasks, with various reasoning strategies showing no significant improvement. The talk is based on research that resulted in the NEOCODER dataset, which allows for reproducible creativity benchmarking of future language models.
Syllabus
Benchmarking Language Model Creativity: A Case Study on Code Generation --- NAACL 2025 (Yining Lu)
Taught by
Center for Language & Speech Processing(CLSP), JHU