Overview
Explore the capabilities of Qwen3 models in this 38-minute tutorial from Trelis Research that covers both dense and Mixture of Expert architectures. Learn about the entire Qwen3 family's performance characteristics before diving into practical inference implementations using vLLM and SGLang frameworks, including demonstrations of "thinking" vs "no thinking" approaches. The video provides detailed benchmarking results using llmperf to evaluate performance metrics, then transitions to building MCP (Model-Control-Prompt) Agents with Qwen3. The tutorial concludes with a preview of upcoming fine-tuning content and additional resources. Access the complete repository at Trelis.com/ADVANCED-inference and explore opportunities for technical assistance, employment, grants, or email tutorials through the links provided in the description.
Syllabus
0:00 Qwen3 Dense and Mixture of Expert Models
0:29 Overview of the Qwen3 Family and Performance
6:52 Qwen3 inference with vLLM and SGLang, thinking and no thinking
17:39 Qwen3 inference benchmarking using llmperf
26:54 Qwen3 MCP Agents
37:26 Upcoming fine-tuning video & resources
Taught by
Trelis Research