Abstract
General-purpose graphics computing on processing units (GPGPUs) face significant performance limitations due to memory access latencies, particularly when traditional memory hierarchies and thread-switching mechanisms prove insufficient for complex access patterns in data-intensive applications such as machine learning (ML) and scientific computing. This paper presents a novel hardware design for a memory prefetching subsystem targeted at DDR (Double Data Rate) memory in GPGPU architectures. The proposed prefetching subsystem features a modular architecture comprising multiple parallel prefetching engines, each handling distinct memory address ranges with dedicated data buffers and adaptive stride detection algorithms that dynamically identify recurring memory access patterns. The design incorporates robust system integration features, including context flushing, watchdog timers, and flexible configuration interfaces, for runtime optimization. Comprehensive experimental validation using real-world workloads examined critical design parameters, including block sizes, prefetch outstanding limits, and throttling rates, across diverse memory access patterns. Results demonstrate significant performance improvements with average memory access latency reductions of up to 82% compared to no-prefetch baselines, and speedups in the range of 1.240–1.794. The proposed prefetching subsystem successfully enhances memory hierarchy efficiency and provides practical design guidelines for deployment in production GPGPU systems, establishing clear parameter optimization strategies for different workload characteristics.
| Original language | English |
|---|---|
| Article number | 455 |
| Journal | Technologies |
| Volume | 13 |
| Issue number | 10 |
| DOIs | |
| State | Published - Oct 2025 |
Bibliographical note
Publisher Copyright:© 2025 by the authors.
Keywords
- DDR
- DRAM memory
- GPGPU
- hardware prefetching
- high-performance computing (HPC)
- memory system
- parallel computing
Fingerprint
Dive into the research topics of 'Hardware Design of DRAM Memory Prefetching Engine for General-Purpose GPUs'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver