This course, Production Machine Learning Systems - Locales, is intended for non-English learners. If you want to take this course in English, please enroll in Production Machine Learning Systems. In this course, we dive into the components and best practices of building high-performing ML systems in production environments. We cover some of the most common considerations behind building these systems, e.g. static training, dynamic training, static inference, dynamic inference, distributed TensorFlow, and TPUs. This course is devoted to exploring the characteristics that make for a good ML system beyond its ability to make good predictions.
Production Machine Learning Systems - Locales
Google via Google Cloud Skills Boost
This course may be unavailable.
Overview
Syllabus
- Introduction to Advanced Machine Learning on Google Cloud
- Advanced Machine Learning on Google Cloud
- Welcome
- Architecting Production ML Systems
- Architecting ML systems
- Data extraction, analysis, and preparation
- Model training, evaluation, and validation
- Trained model, prediction service, and performance monitoring
- Training design decisions
- Serving design decisions
- Designing from scratch
- Using Vertex AI
- Lab introduction: Structured data prediction
- Structured data prediction using Vertex AI Platform
- Quiz: Architecting production ML systems
- Readings: Architecting production ML systems
- Designing Adaptable ML Systems
- Introduction
- Adapting to data
- Changing distributions
- Lab: Adapting to data
- Right and wrong decisions
- System failure
- Concept drift
- Actions to mitigate concept drift
- TensorFlow data validation
- Components of TensorFlow data validation
- Lab Introduction: Introduction to TensorFlow Data Validation
- Introduction to TensorFlow Data Validation
- Lab Introduction: Advanced Visualizations with TensorFlow Data Validation
- Advanced Visualizations with TensorFlow Data Validation
- Mitigating training-serving skew through design
- Vertex AI: Training and Serving a Custom Model
- Diagnosing a production model
- Quiz: Designing adaptable ML systems
- Readings: Designing adaptable ML systems
- Designing High-Performance ML Systems
- Introduction
- Training
- Predictions
- Why distributed training is needed
- Distributed training architectures
- TensorFlow distributed training strategies
- Mirrored strategy
- Multi-worker mirrored strategy
- TPU strategy
- Parameter server strategy
- Lab Introduction: Distributed Training with Keras
- Distributed Training with Keras
- Training on large datasets with tf.data API
- Lab Introduction: TPU-speed Data Pipelines
- TPU Speed Data Pipelines
- Inference
- Quiz: Designing high-performance ML systems
- Readings: Designing high-performance ML systems
- Building Hybrid ML Systems
- Introduction
- Machine Learning on Hybrid Cloud
- Kubeflow
- Lab Introduction: Kubeflow Pipelines with AI Platform
- Running Pipelines on Vertex AI 2.5
- TensorFlow Lite
- Optimizing TensorFlow for mobile
- Summary
- Quiz: Hybrid ML systems
- Readings: Hybrid ML systems
- Summary
- Course summary
- Production Machine learning systems - readings
- All quiz questions and answers
- Course Resources
- Architecting Production ML Systems Course Resources