Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

LLM Inference Performance Projection

Open Compute Project via YouTube

Overview

Udemy Special: Ends May 28!
Learn Data Science. Courses starting at $12.99.
Get Deal
This conference talk by Intel Fellow Mohan J Kumar and Principal Engineer Chuan Song introduces MESA, an online platform for evaluating LLM inference performance across different hardware configurations. Discover how this tool addresses the growing market need for AI inference performance prediction, allowing users to evaluate various models on hardware from multiple vendors. Learn how MESA breaks down inference latency by operation types (GEMM, MatMul) and phases (prefill and regression), providing detailed visual graphs for performance analysis. The presentation demonstrates how context length adjustments impact inference latency through an intuitive web UI. The speakers discuss their plans to contribute this tool to open source through the Open Compute Project, aiming to bring transparency and foster growth in the critical area of AI inference performance projection.

Syllabus

LLM Inference Performance Projection

Taught by

Open Compute Project

Reviews

Start your review of LLM Inference Performance Projection

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.