In-depth analyses of unified virtual memory system for GPU accelerated computing | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2024)

research-article

Public Access

Authors: Tyler Allen and Rong Ge

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2021

Article No.: 64, Pages 1 - 15

Published: 13 November 2021 Publication History

Related Artifact: Implementation of the article "In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated Computing" November 2021softwarehttps://doi.org/10.5281/zenodo.5148930

22citation
2,270
Downloads

Metrics

Total Citations22Total Downloads2,270

Last 12 Months1,151

Last 6 weeks125

Get Citation Alerts

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

Manage my Alerts

New Citation Alert!

Please log in to your account

PDFeReader

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
In-depth analyses of unified virtual memory system for GPU accelerated computing
Pages 1 - 15
PREVIOUS ARTICLE3D acoustic-elastic coupling with gravityPreviousNEXT ARTICLEPaths to OpenMP in the kernelNext
- Abstract
- Supplementary Material
- References
- View Options
- References
- Media
- Tables
- Share

Abstract

The abstraction of a shared memory space over separate CPU and GPU memory domains has eased the burden of portability for many HPC codebases. However, users pay for the ease of use provided by systems-managed memory space with a moderate-to-high performance overhead. NVIDIA Unified Virtual Memory (UVM) is presently the primary real-world implementation of such abstraction and offers a functionally equivalent testbed for a novel in-depth performance study for both UVM and future Linux Heterogeneous Memory Management (HMM) compatible systems. The continued advocation for UVM and HMM motivates the improvement of the underlying system. We focus on a UVM-based system and investigate the root causes of the UVM overhead, which is a non-trivial task due to the complex interactions of multiple hardware and software constituents and the requirement of targeted analysis methodology.

In this paper, we take a deep dive into the UVM system architecture and the internal behaviors of page fault generation and servicing. We reveal specific GPU hardware limitations using targeted benchmarks to uncover driver functionality as a real-time system when processing the resultant workload. We further provide a quantitative evaluation of fault handling for various applications under different scenarios, including prefetching and oversubscription. We find that the driver workload is dependent on the interactions among application access patterns, GPU hardware constraints, and Host OS components. We determine that the cost of host OS components is significant and present across implementations, warranting close attention. This study serves as a proxy for future shared memory systems such as those that interface with HMM.

Supplementary Material

MP4 File (In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated Computing.mp4.mp4)

Presentation video

Download
218.89 MB

References

[1]

[n.d.]. High Performance Geometric Multigrid. Retrieved July 13, 2021 from "https://crd.lbl.gov/departments/computer-science/par/research/hpgmg/"

Google Scholar

[2]

Tyler Allen and Rong Ge. 2021. Demystifying GPU UVM Cost with Deep Runtime and Workload Analysis. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 141--150.

Crossref

Google Scholar

[3]

Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J. Rossbach, and Onur Mutlu. 2017. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (Cambridge, Massachusetts) (MICRO-50 '17). Association for Computing Machinery, New York, NY, USA, 136--150.

Digital Library

Google Scholar

[4]

T. Baruah, Y. Sun, A. T. Dinçer, S. A. Mojumder, J. L. Abellán, Y. Ukidave, A. Joshi, N. Rubin, J. Kim, and D. Kaeli. 2020. Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 596--609.

Crossref

Google Scholar

[5]

Natalie Beams, Ahmad Abdelfattah, Stan Tomov, Jack Dongarra, Tzanio Kolev, and Yohann Dudouit. 2020. High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs. In 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA). 53--60.

Crossref

Google Scholar

[6]

D. A. Beckingsale, J. Burmark, R. Hornung, H. Jones, W. Killian, A. J. Kunen, O. Pearce, P. Robinson, B. S. Ryujin, and T. R. Scogland. 2019. RAJA: Portable Performance for Large-Scale Scientific Applications. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 71--81.

Crossref

Google Scholar

[7]

Manuel Birke, Bobby Philip, Zhen Wang, and Mark Berrill. 2019. Block-Relaxation Methods for 3D Constant-Coefficient Stencils on GPUs and Multicore CPUs. arXiv:1208.1975 [cs.DC]

Google Scholar

[8]

H. Carter Edwards, Christian R. Trott, and Daniel Sunderland. 2014. Kokkos. J. Parallel Distrib. Comput. 74, 12 (Dec. 2014), 3202--3216.

Digital Library

Google Scholar

Cited By

View all

Cooper BScogland TGe R(2024)Shared Virtual Memory: Its Design and Performance Implications for Diverse ApplicationsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656608(26-37)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656608
Wagley BMarkthub PCrea JWu BBelviranli M(2024)Exploring Page-based RDMA for Irregular GPU Workloads. A case study on NVMe-backed GNN ExecutionProceedings of the 16th Workshop on General Purpose Processing Using GPU10.1145/3649411.3649413(7-12)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3649411.3649413
Elis BPearce OBoehme DBurmark JSchulz M(2024)Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU ParallelismProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635036(1-11)Online publication date: 18-Jan-2024
https://dl.acm.org/doi/10.1145/3635035.3635036
Show More Cited By

Index Terms

In-depth analyses of unified virtual memory system for GPU accelerated computing
1. Computer systems organization
2. General and reference
  1. Cross-computing tools and techniques
    1. Performance
3. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory
4. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains

Index terms have been assigned to the content through auto-classification.

Recommendations

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing
The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...
Read More
GPU virtualization for high performance general purpose computing on the ESX hypervisor
HPC '14: Proceedings of the High Performance Computing Symposium
Graphics Processing Units (GPU) have become important components in high performance computing (HPC) systems for their massively parallel computing capability and energy efficiency. Virtualization technologies are increasingly applied to HPC to reduce ...
Read More

Comments

Information & Contributors

Information

Published In

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2021

1493 pages

ISBN:9781450384421

DOI:10.1145/3458817

General Chair:
Bronis R. de Supinski,
Program Chairs:
Mary Hall,
Todd Gamblin

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [emailprotected].

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

GPGPU
GPU
HMM
NVIDIA
UVM
virtual memory

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

SC '21

Sponsor:

SIGHPC

SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 19, 2021

Missouri, St. Louis

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
2,270
Total Downloads

Downloads (Last 12 months)1,151
Downloads (Last 6 weeks)125

Other Metrics

View Author Metrics

Citations

Cited By

View all

Cooper BScogland TGe R(2024)Shared Virtual Memory: Its Design and Performance Implications for Diverse ApplicationsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656608(26-37)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656608
Wagley BMarkthub PCrea JWu BBelviranli M(2024)Exploring Page-based RDMA for Irregular GPU Workloads. A case study on NVMe-backed GNN ExecutionProceedings of the 16th Workshop on General Purpose Processing Using GPU10.1145/3649411.3649413(7-12)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3649411.3649413
Elis BPearce OBoehme DBurmark JSchulz M(2024)Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU ParallelismProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635036(1-11)Online publication date: 18-Jan-2024
https://dl.acm.org/doi/10.1145/3635035.3635036
Choi JJung SYeom HHong JPark J(2024)GPU Memory Reallocation Techniques in Fully hom*omorphic Encryption WorkloadsProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636037(1525-1532)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3605098.3636037
Kang P(2023)Programming for High-Performance Computing on Edge AcceleratorsMathematics10.3390/math1104105511:4(1055)Online publication date: 20-Feb-2023
https://doi.org/10.3390/math11041055
Allen TCooper BGe R(2023)Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual MemoryACM Transactions on Architecture and Code Optimization10.1145/363295321:1(1-24)Online publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1145/3632953
Iwata SArpaci-Dusseau RKasagi A(2023)An Analysis of Graph Neural Network Memory Access PatternsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624168(915-921)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624168
Zhang HZhou YXue YLiu YHuang J(2023)G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor MigrationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614309(395-410)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614309
Li BGuo YWang YJaleel AYang JTang X(2023)IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE InvalidationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614269(1163-1177)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614269
Huang WDu YLiu M(2023)GPU Performance Acceleration via Intra-Group Sharing TLBProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605593(705-714)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605593
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

In-depth analyses of unified virtual memory system for GPU accelerated computing | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2024)

New Citation Alert added!

New Citation Alert!

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other