Overview
Explore groundbreaking research on novel caching techniques for Generative AI in this technical talk from Adobe Research scientist Dr. Subrata Mitra. Dive deep into two innovative works that address the computational challenges in text-to-image generation and large language models. Learn about NIRVANA, an approximate-caching system that optimizes diffusion models by reusing intermediate noise states, significantly reducing GPU compute requirements and generation latency. Discover Cache-Craft, a system extending approximate caching to Retrieval-Augmented Generation (RAG) in LLMs, which improves efficiency by reusing precomputed key-value pairs for knowledge chunks. Understand how these approaches are revolutionizing traditional caching principles to meet the unique demands of generative AI workflows, with findings published in USENIX NSDI 2024 and upcoming in ACM SIGMOD 2025.
Syllabus
Redefining Caching for Generative AI | Dr. Subrata Mitra, Adobe Research
Taught by
Centre for Networked Intelligence, IISc