Publications (Google Scholar, DBLP)


Conference

2026

[C45] A Bitwidth-Flexible Modular Multiplier with Shift-Free Accumulation for Efficient NTT Acceleration in FHE Zihao Yang, Dian Jiao, Shengyu Fan, Xianglong Deng, Rui Hou, Mingzhe Zhang. 2026 IEEE International Symposium on Circuits and Systems. ISCAS 2026

[C44] A Framework for Developing and Optimizing Fully Homomorphic Encryption Programs on GPUs
Zhuoran Ji, Jianyu Zhao, Guang Fan, Mingzhe Zhang, Shoumeng Yan, Xueyu Wu, Lei Ju. 31st International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS 2026

[C43] Libra: Pattern-Scheduling Co-Optimization for Cross-Scheme FHE Code Generation over GPGPU
Song Bian, Yintai Sun, Zian Zhao, Haowen Pan, Mingzhe Zhang, Zhenyu Guan. USENIX Security ‘26. Security 2026

[C42] Xerxes: Extensive Exploration of Scalable Hardware Systems with CXL-Based Simulation Framework
Yuda An, Shushu Yi, Bo Mao, Qiao Li, Mingzhe Zhang, Diyu Zhou, Ke Zhou, Nong Xiao, Guangyu Sun, Yingwei Luo, Jie Zhang. 24th USENIX Conference on File and Storage Technologies. FAST 2026

[C41] Falcon: Algorithm-Hardware Co-Design for Efficient Fully Homomorphic Encryption Accelerator
Liang Kong, Shengyu Fan, Xianglong Deng, Lei Chen, Guang Fan, Yilan Zhu, Geng Yang, Yisong Chang, Shoumeng Yan, Mingzhe Zhang. 31st International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS 2026

[C40] An Efficient and Scalable Hardware Architecture for Number Theoretic Transform on FPGA with Design Automation
Yilan Zhu, Geng Yang, Xingyu Tian, Dilshan Kumarathunga, Liang Kong, Xianglong Deng, Shengyu Fan, Guang Fan, Guiming Shi, Lei Chen, Bo Zhang, Yisong Chang, Shoumeng Yan, Zhenman Fang, Mingzhe Zhang. 32nd International Symposium on High-Performance Computer Architecture.HPCA 2026

[C39] FHEFusion: Enabling Operator Fusion in FHE Compilers for Depth-Efficient DNN Inference
Tianxiang Sui, Jianxin Lai, Long Li, Peng Yuan, Yan Liu, Qing Zhu, Xiaojing Zhang, Linjie Xiao, Mingzhe Zhang, Jingling Xue. 24th IEEE/ACM International Symposium on Code Generation and Optimization. CGO 2026

2025

[C38] Analysis of Bit-Flip Attacks on Encrypted Neural Networks
Zihao Yang, Yilan Zhu, Rui Hou, Dan Meng, Shengyu Fan, Mingzhe Zhang. 31st IEEE International Conference on Parallel and Distributed Systems. ICPADS 2025

[C37] WPC: Weight Plaintext Compression for CNN Inference based on RNS-CKKS
Guiming Shi, Yuchen Wei, Shengyu Fan, Xianglong Deng, Liang Kong, Xianbin LI, Jingwei Cai, Shuwen Deng, Mingzhe Zhang, Kaisheng Ma. 32nd ACM Conference on Computer and Communications Security. CCS 2025

[C36] On Optimizing Intra- and Inter-chiplet Interconnection Networks in Multi-chiplet Systems for Accelerating FHE Encrypted Neural Network Applications
Zewei Lai, Jinhui Ye, Xiaohang Wang, Zheang Fu, Amit Kumar Singh, Yingtao Jiang, Kui Ren, Mei Yang, Sihai Qiu, Xiaodong Li, Xin Tang, Jie Song, Mingzhe Zhang. the 14th International Conference on Compilers, Architectures, and Synthesis for Embedded Systems. CASES 2025

[C35] LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous Integration
Tiantian Lin, Cheng Qiu, Xiaohang Wang, Ling Wang, Zhulin Zheng, Yingtao Jiang, Amit Kumar Singh, Jieming Yin, Sihai Qiu, Xiaodong Li, Xin Tang, Jie Song, Mingzhe Zhang, Kui Ren. 58th IEEE/ACM International Symposium on Microarchitecture. MICRO 2025

[C34] HAWK: Fully Homomorphic Encryption Accelerator with Fixed-Word Key Decomposition Switching
Liang Kong, Shengyu Fan, Xianglong Deng, Guang Fan, Lei Chen, Guiming Shi, Yilan Zhu, Geng Yang, Shoumeng Yan, Mingzhe Zhang. 58th IEEE/ACM International Symposium on Microarchitecture. MICRO 2025

[C33] The Bleak Future of Fully Homomorphic Encryption: from a Storage I/O Perspective
Lei Chen, Erci Xu, Yiming Sun, Shengyu Fan, Xianglong Deng, Guiming Shi, Guang Fan, Liang Kong, Yilan Zhu, Shoumeng Yan, Mingzhe Zhang. 16th International Symposium on Advanced Parallel Processing Technologies. APPT 2025

[C32] XHarvest: Rethinking High-Performance and Cost-Efficient SSD Architecture with CXL-Driven Harvesting
Li Peng, Wenbo Wu, Shushu Yi, Xianzhang Chen, Chenxi Wang, Shengwen Liang, Zhe Wang, Nong Xiao, Qiao Li, Mingzhe Zhang, Jie Zhang. 52st ACM/IEEE Annual International Symposium on Computer Architecture. ISCA 2025

[C31] Neo: Towards Efficient Fully Homomorphic Encryption Acceleration using Tensor Core
Dian Jiao, Xianglong Deng, Zhiwei Wang, Shengyu Fan, Yi Chen, Rui Hou, Mingzhe Zhang. 52st ACM/IEEE Annual International Symposium on Computer Architecture. ISCA 2025

[C30] FAST:An FHE Accelerator for Scalable-parallelism with Tunable-bit
Shengyu Fan, Xianglong Deng, Liang Kong, Guiming Shi, Guang Fan, Rui Hou, Dan Meng, Mingzhe Zhang. 52st ACM/IEEE Annual International Symposium on Computer Architecture. ISCA 2025

[C29] ALLMod: Exploring Area-Efficiency of LUT-based Large Number Modular Reduction via Hybrid Workloads
Fangxin Liu, Haomin Li, Zongwu Wang, Bo Zhang, Mingzhe Zhang, Shoumeng Yan, Li Jiang. 62nd ACM/IEEE Design Automation Conference. DAC 2025

[C28] CipherPrune: Efficient and Scalable Private Transformer Inference
Yancheng Zhang, Jiaqi Xue, Mengxin Zheng, Mimi Xie, Mingzhe Zhang, Lei Jiang, Qian Lou. 13th International Conference on Learning Representations. ICLR 2025

[C27] WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores
Guang Fan, Mingzhe Zhang, Fangyu Zheng, Shengyu Fan, Tian Zhou, Xianglong Deng, Wenxu Tang, Liang Kong, Yixuan Song, Shoumeng Yan. 31th International Symposium on High-Performance Computer Architecture. HPCA 2025

2024

[C26] Trinity: A General Purpose FHE Accelerator
Xianglong Deng, Shengyu Fan, Zhicheng Hu, Zhuoyu Tian, Zihao Yang, Jiangrui Yu, Dingyuan Cao, Dan Meng, Rui Hou, Meng Li, Qian Lou, Mingzhe Zhang. 57th IEEE/ACM International Symposium on Microarchitecture. MICRO 2024

[C25] Flagger: Cooperative Acceleration for Large-Scale Cross-Silo Federated Learning Aggregation
Xiurui Pan, Yuda An, Shengwen Liang, Bo Mao, Mingzhe Zhang, Qiao Li, Myoungsoo Jung, Jie Zhang. 51st ACM/IEEE Annual International Symposium on Computer Architecture. ISCA 2024

[C24] Alchemist: A Unified Accelerator Architecture for Cross-Scheme Fully Homomorphic Encryption
Jianan Mu, Husheng Han, Shangyi Shi, Jing Ye, Zizhen Liu, Shengwen Liang, Meng Li, Mingzhe Zhang, Song Bian, Xing Hu, Huawei Li, Xiaowei Li. Proceedings of the 61st Annual Design Automation Conference. DAC 2024

2023

[C23] Poseidon: Practical Homomorphic Encryption Accelerator
Yinghao Yang, Huaizhi Zhang, Shengyu Fan, Hang Lu, Mingzhe Zhang, Xiaowei Li. The 29th IEEE International Symposium on High-Performance Computer Architecture. HPCA 2023

[C22] TensorFHE: Achieving Practical Computation on Encrypted Data Using GPGPU
Shengyu Fan, Zhiwei Wang, Weizhi Xu, Rui Hou, Dan Meng, Mingzhe Zhang. The 29th IEEE International Symposium on High-Performance Computer Architecture. HPCA 2023

2022

[C21] Enhancing GPU Performance via Neighboring Directory Table Based Inter-TLB Sharing
Yajuan Du, Mingyang Liu, Yuqi Yang, Mingzhe Zhang and Xulong Tang. The 40th IEEE International Conference on Computer Design. ICCD 2022

2021

[C20] Distilling Bit-level Sparsity Parallelism for General Purpose Deep Learning Acceleration
Hang Lu, Liang Chang, Chenglong Li, Zixuan Zhu, Shengjian Lu, Yanhuan Liu, Mingzhe Zhang. The 54th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 2021

[C19] BitX: Empower Versatile Inference with Hardware Runtime Pruning
Hongyan Li, Hang Lu, Jiawen Huang, Wenxu Wang, Mingzhe Zhang, Wei Chen, Liang Chang and Xiaowei Li. 50th International Conference on Parallel Processing. ICPP 2021

[C18] CoPIM: A Concurrency-aware PIM Workload Offloading Architecture for Graph Applications
Liang Yan, Mingzhe Zhang, Rujia Wang, Xiaoming Chen, Xingqi Zou, Xiaoyang Lu, Yinhe Han, Xian-He Sun. 2021 IEEE/ACM International Symposium on Low Power Electronics and Design. ISLPED 2021

[C17] Streamline Ring ORAM Accesses through Spatial and Temporal Optimization
Dingyuan Cao, Mingzhe Zhang, Hang Lu, Xiaochun Ye, Dongrui Fan, Yuezhi Che, Rujia Wang. The 27th IEEE International Symposium on High-Performance Computer Architecture. HPCA 2021.

2019

[C16] Self-adaptive Address Mapping Mechanism for Access Pattern Awareness on DRAM
Chundian Li, Mingzhe Zhang, Zhiwei Xu, Xianhe Sun. 17th IEEE International Symposium on Parallel and Distributed Processing with Applications. ISPA 2019.

[C15] When Deep Learning Meets the Edge: Auto-Masking Deep Neural Networks for Efficient Machine Learning on Edge Devices
Ning Lin, Hang Lu, Jingliang Gao, Mingzhe Zhang, Xiaowei Li. 37th IEEE International Conference on Computer Design. ICCD 2019.

[C14] Balancing Performance and Energy Efficiency of ONoC by Using Adaptive Bandwidth
Mingzhe Zhang, Lunkai Zhang, Frederic T. Chong, Zhiyong Liu. 37th IEEE International Conference on Computer Design. ICCD 2019.

[C13] FindeR: Accelerating FM-Index-based Exact Pattern Matching in Genomic Sequences through ReRAM technology
Farzaneh Zokaee, Mingzhe Zhang, Lei Jiang. 28th International Conference on Parallel Architectures and Compilation. PACT 2019.

[C12] C-MAP: Improving the Effectiveness of Mapping Method for CGRA by Reducing NoC Congestion
Shuqian An, Mingzhe Zhang, Xiaochun Ye, Da Wang, Hao Zhang, Dongrui Fan, Zhimin Tang. 21st IEEE International Conference on High Performance Computing and Communications. HPCC 2019.

[C11] Magma: A Monolithic 3D Vertical Heterogeneous ReRAM-based Main Memory Architecture
Farzaneh Zokaee, Mingzhe Zhang, Xiaochun Ye, Dongrui Fan, Lei Jiang. 2019 Proceedings of the 56th Annual Design Automation Conference. DAC 2019.

2018

[C10] Mmalloc: A Dynamic Memory Management on Many-core Coprocessor for the Acceleration of Storage-intensive Bioinformatics Application
Zihao Wang, Mingzhe Zhang, Jingrong Zhang, Rui Yan, Xiaohua Wan, Zhiyong Liu, Fa Zhang, Xuefeng Cui. 2018 IEEE International Conference on Bioinformatics and Biomedicine. BIBM 2018.

2017

[C09] Quick-and-Dirty: Improving Performance of MLC PCM by Using Temporary Short Writes
Mingzhe Zhang, Lunkai Zhang, Lei Jiang, Frederic T Chong, Zhiyong Liu. 35th IEEE International Conference on Computer Design. ICCD 2017.

[C08] Balancing performance and lifetime of MLC PCM by using a region retention monitor
Mingzhe Zhang, Lunkai Zhang, Lei Jiang, Zhiyong Liu, Frederic T Chong. 2017 IEEE International Symposium on High Performance Computer Architecture. HPCA 2017.

2016

[C07] COMRANCE: A rapid method for Network-on-Chip design space exploration
Mingzhe Zhang, Yangguang Shi, Fa Zhang, Zhiyong Liu. 2016 The 7th International Green and Sustainable Computing Conference. IGSC 2016.

2014

[C06] SpongeDirectory: Flexible sparse directories utilizing multi-level memristors
Lunkai Zhang, Dmitri Strukov, Hebatallah Saadeldeen, Dongrui Fan, Mingzhe Zhang, Diana Franklin. 2014 The 23rd International Conference on Parallel Architecture and Compilation Techniques. PACT 2014.

2013

[C05] SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture
Xiaochun Ye, Dongrui Fan, Ninghui Sun, Shibin Tang, Mingzhe Zhang, Hao Zhang. Proceedings of the 2013 International Symposium on Low Power Electronics and Design. ISLPED 2013.

[C04] Spontaneous reload cache: Mimicking a larger cache with minimal hardware requirement
Lunkai Zhang, Mingzhe Zhang, Lingjun Fan, Da Wang, Paolo Ienne. 2013 IEEE 8th International Conference on Networking, Architecture and Storage. NAS 2013.

[C03] Energy-Performance Modeling and Optimization of Parallel Computing in On-Chip Networks
Shuai Zhang, Zhiyong Liu, Dongrui Fan, Fonglong Song, Mingzhe Zhang. 2013 12th IEEE International Symposium on Parallel and Distributed Processing with Applications. ISPA 2013.

[C02] A Path-Adaptive Opto-electronic Hybrid NoC for Chip Multi-processor
Mingzhe Zhang, Da Wang, Xiaochun Ye, Liqiang He, Dongrui Fan, Zhiyong Liu. 2013 12th IEEE International Symposium on Parallel and Distributed Processing with Applications. ISPA 2013.

2012

[C01] Self-Correction Trace Model: A Full-System Simulator for Optical Network-on-Chip
Mingzhe Zhang, Liqiang He, Dongrui Fan. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IPDPSW 2012.

Journals & Transactions

2026

[J17] Binary Subscript Representation and Efficient Pipelined Number Theoretic Transform
Bo Zhang, Mingzhe Zhang, Shoumeng Yan. Exploration of Karatsuba Algorithm for Efficient Barrett Modular Multiplication.

2025

[J16] TensorFHE+: Fully Homomorphic Encryption Acceleration based on Linear Algebra
Yintai Sun, Shengyu Fan, Zhenhua Yin, Xinkai Song, Xing Hu, Zidong Du, Qi Guo, Weizhi Xu, Rui Hou, Dan Meng, Song Bian, Mingzhe Zhang. IEEE Transactions on Computers, early access.

[J15] Corrosion Hammer: A Self-Activated Bit-Flip Attack to the Processing-In-Memory Accelerator
ZIHAO YANG; Mengxin Zheng; Shengyu Fan; Qian Lou; Rui Hou; Dan Meng; Mingzhe Zhang. CyberSecurity.

[J14] Understanding and Boosting Fully Homomorphic Encryption Applications on GPU
Shengyu fan; Xianglong Deng; Xulong Tang; Weizhi Xu; Mingzhe Zhang. CyberSecurity.

[J13] A Survey on Fully Homomorphic Encryption Compilers
Zhuoyu Tian; Shengyu Fan; Xianglong Deng; Rui Hou; Dan Meng; Mingzhe Zhang. CyberSecurity.

[J12] Exploration of Karatsuba Algorithm for Efficient Barrett Modular Multiplication
Bo Zhang, Mingzhe Zhang, Shoumeng Yan. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD).

[J11] RAC-NAF: A Reconfigurable Analog Circuitry for Nonlinear Activation Function Computation in Computing-in-Memory
Chenjia Xie, Zhuang Shao, Mingzhe Zhang, Yuan Du, Li Du. IEEE Journal of Solid-State Circuits (JSSC), Vol 60, Issue 10, pp. 3738 - 3748 (2025).

[J10] An Efficient Delta Compression Framework Seamlessly Integrated into Inline Deduplication
Yucheng Zhang, Wenbin Zeng, Hong Jiang, Dan Feng, Zichen Xu, Shuibing He, Mingzhe Zhang, Dan Wu. ACM Transactions on Storage (TOS), Vol 21, Issue 4, pp. 1-30 (2025).

[J09] LP-HENN: Fully Homomorphic Encryption Accelerator with High Energy Efficiency
Zhuoyu Tian; Lei Chen; Shengyu Fan; Xianglong Deng; Rui Hou; Dan Meng; Mingzhe Zhang. CyberSecurity, Vol 8, Artical No 98 (2025).

2024

[J08] Skyway: Accelerate Graph Applications with a Dual-Path Architecture and Fine-Grained Data Management
Mo Zou, Mingzhe Zhang, Rujia Wang, Xian-He Sun, Xiaochun Ye, Dong-Rui Fan, Zhimin Tang. Journal of Computer Science and Technology, 39(4): 871-894 (2024).

2022

[J07] VNet: a versatile network to train real-time semantic segmentation models on a single GPU
Wenxing Li, Ning Lin, Mingzhe Zhang, Hang Lu, Xiaoming Chen, Xiaowei Li. Science China Information Sciences, Vol. 64, Issue 3, pp. 1-2.

[J06] Accelerating Graph Processing with Lightweight Learning-Based Data Reordering
Mo Zou, Mingzhe Zhang, Rujia Wang, Xian-He Sun, Xiaochun Ye, Dongrui Fan, Zhimin Tang. IEEE Computer Architecture Letters, Vol. 21, Issue 1, pp. 5-8.

[J05] Application-Oriented Data Migration to Accelerate In-Memory Database on Hybrid Memory
Wenze Zhao, Yajuan Du, Mingzhe Zhang, Mingyang Liu, Kailun Jin, Rachata Ausavarungnirun. Micromachines, Vol. 13, Issue 1, pp. 52-60.

2020

[J04] Architecting Effectual Computation for Machine Learning Accelerators
Hang Lu, Mingzhe Zhang, Yinhe Han, Huawei Li, Li Xiaowei. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol. 39, Issue 10, pp. 2654-2667.

2019

[J03] A Survey on Architecture Research of Non-Volatile Memory based on Dynamical Trade-off (in chinese)
Mingzhe Zhang, Fa Zhang, Zhiyong Liu. Journal of Computer Research and Development, Vol. 56, Issue 4, pp. 677-691.

[J02] Quick-and-Dirty: An Architecture for High-Performance Temporary Short Writes in MLC PCM
Mingzhe Zhang, Lunkai Zhang, Lei Jiang, Frederic T Chong, Zhiyong Liu. IEEE Transactions on Computers (TC), Vol. 68, Issue 9, pp. 1365-1375.

2015

[J01] FreeRider: Non-local adaptive network-on-chip routing with packet-carried propagation of congestion information
Shaoli Liu, Tianshi Chen, Ling Li, Xi Li, Mingzhe Zhang, Chao Wang, Haibo Meng, Xuehai Zhou, Yunji Chen. IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 26, Issue 8, pp. 2272-2285.