Explore practical approaches to implementing local LLM inference using modern Java in this 47-minute Devoxx conference talk. Learn how to leverage the latest Java features to implement local inference for open-source models like Llama 3 (Meta) on standard CPUs without specialized hardware. Discover techniques for creating efficient LLM inference engines using Java 21+ features, building flexible frameworks adaptable to multiple LLM architectures, and maximizing CPU utilization without GPU dependencies. The presentation covers integration with LangChain4j for streamlined execution, performance optimization with Java Vector API for accelerated matrix operations, and leveraging GraalVM to reduce latency and memory consumption. Gain valuable insights for implementing and optimizing local LLM inference in Java projects to create fast, efficient AI applications using cutting-edge Java technologies.
Overview
Syllabus
Practical LLM Inference in Modern Java - Alina Yurenko & Alfonso² Peterssen
Taught by
Devoxx