Overview
Explore groundbreaking research on autonomous software engineering agents in this MLOps.community conference talk. Delve into three key works that demonstrate the potential of using software engineering as a testing ground for next-generation language models. Learn about SWE-bench, a benchmark system evaluating AI's ability to solve real GitHub issues across 2,294 Python repository tasks, and discover SWE-agent, an autonomous system achieving a 12.5% resolved rate on the SWE-bench test set. Examine the implications of SWE-bench Multimodal's findings from 617 JavaScript repository tasks, which highlight the importance of generalizability in AI systems and reveal potential Python-specific biases in existing coding agents. Presented by Stanford University PhD student John Yang, whose research focuses on Language Agents, Language Model Evaluation, and Software Engineering.
Syllabus
Few Shot Code Generation to Autonomous Software Engineering Agents // John Yang
Taught by
MLOps.community