Poseidon: Practical Homomorphic Encryption Accelerator
Published in The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA2023), 2023
Recommended citation: Poseidon: Practical Homomorphic Encryption Accelerator. Yinghao Yang, Huaizhi Zhang, Shengyu Fan, Hang Lu, Mingzhe Zhang, Xiaowei Li. The 29th IEEE International Symposium on High-Performance Computer Architecture. HPCA2023.
Abstract
With the development of the important solution for privacy computing—Fully Homomorphic Encryption (FHE)—the explosion of data size and computing intensity in FHE applications has brought enormous challenges to hardware design. In this paper, we propose a novel HBM+FPGA architecture acceleration scheme named “Poseidon,” which focuses on improving the efficiency of the hardware resource and the bandwidth. To implement a practical and efficient accelerator that supports complex FHE applications that require Bootstrapping using limited FPGA resources, we refine the FHE application into inseparable computational streams, implement these lowest-level computational units (CUs), and combine them into upper-layer FHE operators through multiplexing to further complete the entire FHE application. To utilize resources more efficiently and improve parallelism, we adopt the radix-based NTT algorithm and propose HFAuto, an automorphism computing unit highly paralleled and suitable for FPGA. Then, we design the accelerator based on the optimized CUs and HBM to maximize data and computation parallelism with the limited hardware resources. Additionally, we evaluate Poseidon with four domain-specific FHE applications on the Alveo U280, which is a practical HBM+FPGA device. The empirical studies show that the efficient reuse of computational unit and on-chip storage enable Poseidon to be vastly superior to the state-of-the-art FPGA-based accelerator and to obtain performance close to F1, an ASIC-based accelerator: (1) up to 370× speedup over CPU for all operators of FHE; (2) up to 1300×/52× higher speedup over CPU and the best FPGA solution for NTT; (3) up to 10.6×/8.7× higher speedup over SOTA GPU implementation and ASIC-based F1 for homomorphic logistic regression.