Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Indian Institute of Technology, Kharagpur

GPU Architectures and Programming

Indian Institute of Technology, Kharagpur and NPTEL via Swayam


The course covers basics of conventional CPU architectures, their extensions for single instruction multiple data processing (SIMD) and finally the generalization of this concept in the form of single instruction multiple thread processing (SIMT) as is done in modern GPUs. We cover GPU architecture basics in terms of functional units and then dive into the popular CUDA programming model commonly used for GPU programming. In this context, architecture specific details like memory access coalescing, shared memory usage, GPU thread scheduling etc which primarily effect program performance are also covered in detail. We next switch to a different SIMD programming language called OpenCL which can be used for programming both CPUs and GPUs in a generic manner. Throughout the course we provide different architecture-aware optimization techniques relevant to both CUDA and OpenCL. Finally, we provide the students with detail application development examples in two well-known GPU computing scenarios.

INTENDED AUDIENCE : Computer Science, Electronics, Electrical Engg students
PREREQUISITES : Programming and Data Structure, Digital Logic, Computer architecture
INDUSTRY SUPPORT : NVIDIA, AMD, Google, Amazon and most big-data companies



Week 1 :Review of Traditional Computer Architecture – Basic five stage RISC Pipeline, Cache Memory, Register File, SIMD instructions
Week 2 :GPU architectures - Streaming Multi Processors, Cache Hierarchy,The Graphics Pipeline
Week 3 :Introduction to CUDA programming
Week 4 :Multi-dimensional mapping of dataspace, Synchronization
Week 5 :Warp Scheduling, Divergence
Week 6 :Memory Access Coalescing
Week 7 :Optimization examples : optimizing Reduction Kernels
Week 8 :Optimization examples : Kernel Fusion, Thread and Block
Week 9 :OpenCL basics
Week 10:OpenCL for Heterogeneous Computing
Week 11-12 :Application Design : Efficient Neural Network Training/Inferencing

Taught by

Prof. Soumyajit Dey

Related Courses


Start your review of GPU Architectures and Programming

Never Stop Learning!

Get personalized course recommendations, track subjects and courses with reminders, and more.

Sign up for free