Optimising OpenMP and MPI Programs on Multi-Core Architectures
Homogeneous and heterogeneous multi-core processors are the building blocks of the current and, at least, the next generation of supercomputers. Therefore, this project brings together our current work in using multi-core processors for scientific computing (Munich M-Core Initiative, Periscope, IGSSE: Hardware-Aware Simulation and Computing) as well as for embedded systems (AutoVision, FlexPath, CAR@TUM).
We propose to investigate the performance of OpenMP and MPI programs on available and future HPC multi-core processors. In order to perform efficient design space exploration and performance evaluation, we will develop appropriate extensions to existing multi-core simulation environments for shared and distributed address spaces. As a result of the above analysis, we will finally devise optimization techniques to improve the execution of OpenMP/MPI programs. In addition, we will develop programming tools that will automatically or semi-automatically optimise existing programs. This will improve programming productivity considerably.
Arndt Bode and Michael Gerndt will focus on program optimisation and programming tools for multi-core processors, especially with respect to cache and memory access (2 positions). Andreas Herkersdorf will concentrate on simulation environments and architectural improvements of multi-core architectures based on his work in the area of embedded processors (1 position). Main application partners will be Notker Rösch (1 position) and Hans-Peter Bunge. Additional applications come from DC MAT1 and DC BIO1.
Quantum chemistry programs working on density functional theory (DFT) supply large parts with natural parallelizable tasks. As for the quantum chemistry program ParaGauss parallelism was always an important aspect. Until now parallelizable parts were handled by static distributions or a master slave concept. However as the aim is towards larger processor numbers it became desirable to have also a dynamic load balancing strategy at hand. The master slave concept would get bottlenecks if the master has to provide too many slaves with tasks. In the project a DLB framework was implemented using a simple work stealing algorithm.
|Prof. Dr. Michael Gerndt (coordination)||Computer Organisation; Parallel Computer Architecture|
|Prof. Dr. Arndt Bode||Computer Organisation; Parallel Computer Architecture|
|Prof. Dr. Hans-Joachim Bungartz||Scientific Computing in Computer Science|
|Prof. Dr. Hans-Peter Bunge||Geophysics|
|Prof. Dr. sc.techn. Andreas Herkersdorf||Integrated Systems|
|Prof. Dr. Dr. h.c. Notker Rösch||Theoretical Chemistry|
|David Büttner (from Apr 2010)|
|Markus Gerstel (May 2009 - Mar 2010]|
|Prof. Dr. Michael Gerndt|
|MAPCO - Multicore Architecture and Programming Model Co-Optimization|
Poster (PDF) - Astrid Nikodem