MAPS: Optimizing massively parallel applications using device-level memory abstraction

Eri Rubin, Ely Levy, Amnon Barak, Tal Ben-Nun*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

GPUs play an increasingly important role in high-performance computing. While developing naive code is straightforward, optimizing massively parallel applications requires deep understanding of the underlying architecture. The developer must struggle with complex index calculations and manual memory transfers. This article classifies memory access patterns used in most parallel algorithms, based on Berkeley's Parallel "Dwarfs." It then proposes the MAPS framework, a device-level memory abstraction that facilitates memory access on GPUs, alleviating complex indexing using on-device containers and iterators. This article presents an implementation of MAPS and shows that its performance is comparable to carefully optimized implementations of real-world applications.

Original languageEnglish
Article number44
JournalACM Transactions on Architecture and Code Optimization
Volume11
Issue number4
DOIs
StatePublished - 1 Dec 2014

Keywords

  • GPGPU
  • Heterogeneous computing architectures
  • Memory abstraction
  • Memory access patterns

Fingerprint

Dive into the research topics of 'MAPS: Optimizing massively parallel applications using device-level memory abstraction'. Together they form a unique fingerprint.

Cite this