Enhancing GPU Performance via Neighboring Directory Table Based Inter-TLB Sharing

Published in 40th IEEE International Conference on Computer Design (ICCD2022), 2022

Recommended citation: Enhancing GPU Performance via Neighboring Directory Table Based Inter-TLB Sharing. Yajuan Du, Mingyang Liu, Yuqi Yang, Mingzhe Zhang and Xulong Tang. 40th IEEE International Conference on Computer Design. ICCD 2022.

Abstract

Modern discrete GPUs support Unified Virtual Memory (UVM), simplifying GPU programming. However, UVM entails address translation on each memory access, which introduces expensive performance overhead during address translation. In this work, we select various workloads and conduct experiments on GPU performance. Our investigation shows that many workloads have low L1 TLB hit ratios of less than 40% on average. Even for a particular workload, the hit ratio is as low as 15%, which leads to significant performance degradation. Through further analysis, we find that a lot of common entries exist between neighboring private L1 TLBs, showing clear inter-TLB sharing behavior. To leverage the sharing, we propose a Neighboring Directory table based hardware scheme, named NeiDty. In NeiDty, L1 TLBs can probe physical addresses from neighboring L1 TLBs through a lightweight interconnect network. And NeiDty uses neighboring directory tables to keep track of the shared entries among neighboring L1-TLBs. In addition, we find it better to update address translation after two consecutive neighboring TLB hits than one hit. We run eight typical workloads with Gem5-GPU, and the results show that NeiDty increases the average hit ratio of L1 TLB TLB by 14% and improves the average performance by 10%.