3 Day Course: An Introduction to Parallel Programming for HPC Systems
Conducted by Vivek Sarkar, E.D. Butcher Professor in Engineering,
Professor of Computer Science, Professor of Electrical & Computer Engineering, Rice University.
Day 1 : Introduction to OpenMP 3.0 with hands on lab
Summary: Modern computer systems for HPC have nodes containing multi-core processors. Getting the most performance out of a parallel system often requires using a hybrid programming model with MPI across the node and a shared-memory programming model within. We will introduce shared- memory computing with the OpenMP 3.0 programming model and discuss its use with MPI as part of a hybrid parallel programming solution. The hands-on session will involve single-node shared-memory parallelism as well as adding shared-memory parallelism to a multi-node MPI code for a scientific computational example.
Day 2 : Accelerated computing (CUDA) with hands on lab
Summary: We will give an overview of accelerated computing, with particular emphasis on general purpose graphics processing unit (GPGPU) based acceleration of floating point intensive tasks. We will show how it is possible to extract teraflop performance from a single node using GPGPU’s. Specific attention will be paid to the CUDA compilers and toolkit from NVIDIA.
Day 3 : Introduction to MPI with hands on lab
Summary: We will introduce the MPI Message Passing Interface and discuss how to use communicators, topologies, collective communication, as well as blocking and non-blocking communication. Getting the most out of MPI requires understanding the impact of communication patterns, load imbalance, and serialization on parallel performance. The hands-on lab will include development and analysis of an MPI code for a scientific computational example.
Day 3: The Concurrent Collections Programming Model (Additional Presentation)
| Day 1: Threads and OpenMP | |
| Day 2: CUDA | |
| Day 3: MPI |
The course examples and exercises are formulated as extensions to C. Those who wish can later translate the lessons learned into a Fortran environment.
Acknowledgment: This course material is derived from lectures given at the HPC Summer Institute held at Rice University in May 2010 (http://k2i.rice.edu/events/HPC2010Institute).
| Attachment | Size |
|---|---|
| Vivek Sarkar_seminar_.pdf | 160.08 KB |
| Sarkar-CSIRO-Dec-2011-v2.pdf | 14.18 MB |
| Sarkar-Day1-Threads-OpenMP-lab-handout-v2.pdf | 111.58 KB |
| Sarkar-Day2-CUDA-v2-handout.pdf | 8.56 MB |
| Sarkar-Day1-Threads-OpenMP-lecture-handout-v2.pdf | 3.38 MB |
| Sarkar-Day2-CUDA-lab-handout.pdf | 80.72 KB |
| Sarkar-Day3-MPI-handout.pdf | 3.33 MB |
| Sarkar-Day3-MPI-lab-handout.pdf | 298.55 KB |
| Sarkar-CnC-Python-Dec-2011-v1.pdf | 8 MB |
Seminar: Towards a Portable Execution Model for Extreme Scale Multicore Systems
The computer industry is at a major inflection point in its hardware roadmap due to the end of a decades-long trend of exponentially increasing clock frequencies. Computer systems anticipated in the 2015 -- 2020 timeframe are referred to as Extreme Scale because they will be built using homogeneous and heterogeneous many-core processors with 100’s of cores per chip. These systems pose new critical challenges for software in the areas of concurrency, energy efficiency and resiliency. Unlike previous generations of hardware evolution, this shift towards many-core computing will have a profound impact on software. These software challenges are further compounded by the need to enable parallelism in workloads and application domains that have traditionally not had to worry about multiprocessor parallelism in the past.