Abstract
With the increased popularity of multi-GPU nodes in modern HPC clusters, it is imperative to develop matching programming paradigms for their efficient utilization. In order to take advantage of the local GPUs and the low-latency high-throughput interconnects that link them, programmers need to meticulously adapt parallel applications with respect to load balancing, boundary conditions and device synchronization. This paper presents MAPS-Multi, an automatic multi-GPU partitioning framework that distributes the workload based on the underlying memory access patterns. The framework consists of host- and device-level APIs that allow programs to efficiently run on a variety of GPU and multi-GPU architectures. The framework implements several layers of code optimization, device abstraction, and automatic inference of inter-GPU memory exchanges. The paper demonstrates that the performance of MAPS-Multi achieves near-linear scaling on fundamental computational operations, as well as real-world applications in deep learning and multivariate analysis.
Original language | English |
---|---|
Title of host publication | Proceedings of SC 2015 |
Subtitle of host publication | The International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | IEEE Computer Society |
ISBN (Electronic) | 9781450337236 |
DOIs | |
State | Published - 15 Nov 2015 |
Event | International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 - Austin, United States Duration: 15 Nov 2015 → 20 Nov 2015 |
Publication series
Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
---|---|
Volume | 15-20-November-2015 |
ISSN (Print) | 2167-4329 |
ISSN (Electronic) | 2167-4337 |
Conference
Conference | International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 |
---|---|
Country/Territory | United States |
City | Austin |
Period | 15/11/15 → 20/11/15 |
Bibliographical note
Publisher Copyright:© 2015 ACM.
Keywords
- memory access patterns
- multi-GPU programming