Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

TurkingBench: A Challenge Benchmark for Web Agents - NAACL 2025

Center for Language & Speech Processing(CLSP), JHU via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
This conference talk introduces TurkingBench, a novel benchmark for evaluating how well multi-modal AI models can perform complex web-based tasks. Discover how researchers from Johns Hopkins University's Center for Language & Speech Processing created a benchmark using natural HTML pages originally designed for crowdsourcing workers rather than artificially synthesized web pages. Learn about the benchmark's composition of 32.2K instances across 158 tasks, and the evaluation framework that connects chatbot responses to specific web page actions like text box modifications and radio button selections. Explore the performance results of cutting-edge models including GPT4 and InternVL, which show that while current models outperform random chance, significant improvement opportunities remain for web-based agents.

Syllabus

TurkingBench: A Challenge Benchmark for Web Agents --- NAACL 2025

Taught by

Center for Language & Speech Processing(CLSP), JHU

Reviews

Start your review of TurkingBench: A Challenge Benchmark for Web Agents - NAACL 2025

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.