The course covers basics of conventional CPU architectures, their extensions for single instruction multiple data processing (SIMD) and finally the generalization of this concept in the form of single instruction multiple thread processing (SIMT) as is done in modern GPUs. We cover GPU architecture basics in terms of functional units and then dive into the popular CUDA programming model commonly used for GPU programming. In this context, architecture specific details like memory access coalescing, shared memory usage, GPU thread scheduling etc which primarily effect program performance are also covered in detail. We next switch to a different SIMD programming language called OpenCL which can be used for programming both CPUs and GPUs in a generic manner. Throughout the course we provide different architecture-aware optimization techniques relevant to both CUDA and OpenCL. Finally, we provide the students with detail application development examples in two well-known GPU computing scenarios.
INTENDED AUDIENCE : Computer Science, Electronics, Electrical Engg students PREREQUISITES : Programming and Data Structure, Digital Logic, Computer architecture INDUSTRY SUPPORT : NVIDIA, AMD, Google, Amazon and most big-data companies
Week 1 :Review of Traditional Computer Architecture – Basic five stage RISC Pipeline, Cache Memory, Register File, SIMD instructions Week 2 :GPU architectures - Streaming Multi Processors, Cache Hierarchy,The Graphics Pipeline Week 3 :Introduction to CUDA programming Week 4 :Multi-dimensional mapping of dataspace, Synchronization Week 5 :Warp Scheduling, Divergence Week 6 :Memory Access Coalescing Week 7 :Optimization examples : optimizing Reduction Kernels Week 8 :Optimization examples : Kernel Fusion, Thread and Block Week 9 :OpenCL basics Week 10:OpenCL for Heterogeneous Computing Week 11-12 :Application Design : Efficient Neural Network Training/Inferencing