Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

SEA-LION - Representing Diverse Southeast Asian Languages with Large Language Models

Databricks via YouTube

Overview

Explore the development of SEA-LION, an open-source large language model designed to represent the diverse languages and cultural contexts of Southeast Asia, in this 36-minute conference talk. Discover how AI Singapore collaborated with Databricks MosaicML to create a localized LLM capable of handling multiple languages, including Thai, Indonesian, and Tamil, as well as unique linguistic phenomena like code-switching between dialects. Learn about the design considerations, from customizing tokenizers for regional languages to ensuring cost-effectiveness for resource-constrained organizations. Gain insights into potential applications and the long-term vision for this innovative model that aims to bridge the gap in language representation for Southeast Asian communities.

Syllabus

SEA-LION: Representing the Diverse Languages of Southeast Asia with LLMs

Taught by

Databricks

Reviews

Start your review of SEA-LION - Representing Diverse Southeast Asian Languages with Large Language Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.