IO-Optimized Design-Time Configurable Negacyclic Seven-Step NTT Architecture for FHE Applications

dc.contributor.authorKocer, Emre
dc.contributor.authorKirbiyik, Selim
dc.contributor.authorTosun, Tolun
dc.contributor.authorAlaybeyoğlu, Ersin
dc.contributor.authorSavaş, Erkay
dc.contributor.authorAlaybeyoğlu, Ersin
dc.date.accessioned2025-10-18T09:16:41Z
dc.date.created2025
dc.date.issued2025
dc.departmentFakülteler, Mühendislik Mimarlık ve Tasarım Fakültesi, Elektrik-Elektronik Mühendisliği Bölümü
dc.description35th Edition of the Great Lakes Symposium on VLSI 2025, GLSVLSI 2025 -- New Orleans; LA; Jung Hotel New Orleans -- 212711
dc.descriptionACM SIGDA; SanDisk; Texas Instruments
dc.description.abstractFully Homomorphic Encryption (FHE) enables computations on encrypted data, proving itself to be an essential building block for privacy-preserving applications. However, it involves computationally demanding operations such as polynomial multiplication, with the Number Theoretic Transform (NTT) being the state-of-the-art solution to perform it. Considering that most FHE schemes operate over the negacyclic ring of polynomials, we introduce a novel formulation of the hierarchical Four-Step NTT approach for the negacyclic ring, eliminating the need for pre- and post-processing steps found in the existing methods. To accelerate NTT operations, the Field-Programmable Gate Array (FPGA) devices offer flexible and powerful computing platforms. We propose an FPGA-based, high-speed, parametric and fully pipelined architecture that implements the improved Seven-Step NTT algorithm, which builds upon the four-step algorithm. Our design supports a wide range of parameters, including ring sizes up to 216 and modulus sizes up to 64-bit. We focus on achieving configurable throughput, as constrained by the bandwidth of High-Bandwidth Memory (HBM), which is an additional in-package memory common in high-end FGPA devices such as Alveo U280. We aim to maximize throughput through an IO parametric design on the Alveo U280 FPGA. The implementation results demonstrate that the average latency of our design for batch NTT operation is 8.32?s for the ring size 216 and 64-bit width; a speed-up of 7.96 × compared to the current state-of-the-art designs. © 2025 Elsevier B.V., All rights reserved.
dc.identifier.doi10.1145/3716368.3735514
dc.identifier.endpage21
dc.identifier.isbn9798400714962
dc.identifier.scopus2-s2.0-105017760898
dc.identifier.scopusqualityN/A
dc.identifier.startpage14
dc.identifier.urihttps://doi.org/10.1145/3716368.3735514
dc.identifier.urihttps://hdl.handle.net/11772/19360
dc.identifier.wosWOS:001596652300003
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherAssociation for Computing Machinery
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzScopus_20251016
dc.subjectFhe
dc.subjectFour-Step
dc.subjectFpga
dc.subjectFully-Pipelined
dc.subjectHardware Acceleration
dc.subjectNegacyclic
dc.subjectNtt
dc.subjectSeven-Step
dc.titleIO-Optimized Design-Time Configurable Negacyclic Seven-Step NTT Architecture for FHE Applications
dc.typeConference Object
dspace.entity.typePublication
relation.isAuthorOfPublication2125e712-2c55-4f12-be22-eb1fc0fa7a1f
relation.isAuthorOfPublication.latestForDiscovery2125e712-2c55-4f12-be22-eb1fc0fa7a1f

Dosyalar