Distributed TensorFlow - TensorFlow at O'Reilly AI Conference, San Francisco '18
TensorFlow via YouTube
Overview
This course teaches learners how to perform distributed TensorFlow training using the Keras high-level APIs. The course covers TensorFlow's distributed architecture, setting up a distributed cluster using Kubeflow and Kubernetes, and distributing models created in Keras. The course focuses on skills such as data parallelism, mirrored variables, ring all-reduce, synchronous training, performance on Multi-GPU, setting up multi-node environments, deploying Kubernetes clusters, hierarchical all-reduce, and automatically distributing model code. The teaching method involves a demonstration by the team. This course is intended for individuals interested in distributed TensorFlow training and utilizing Keras high-level APIs.
Syllabus
Training can take a long time
Data parallelism
Mirrored Variables
Ring All-reduce
Synchronous training
Performance on Multi-GPU
Setting up multi-node Environment
Deploy your Kubernetes cluster
Hierarchical All-Reduce
Model Code is Automatically Distributed
Configuring Cluster
Taught by
TensorFlow