IO-Optimized Design-Time Configurable Negacyclic Seven-Step NTT Architecture for FHE Applications

Kocer, Emre; Kirbiyik, Selim; Tosun, Tolun; Alaybeyoğlu, Ersin; Savaş, Erkay

doi:10.1145/3716368.3735514

IO-Optimized Design-Time Configurable Negacyclic Seven-Step NTT Architecture for FHE Applications

dc.contributor.author	Kocer, Emre
dc.contributor.author	Kirbiyik, Selim
dc.contributor.author	Tosun, Tolun
dc.contributor.author	Alaybeyoğlu, Ersin
dc.contributor.author	Savaş, Erkay
dc.contributor.author	Alaybeyoğlu, Ersin
dc.date.accessioned	2025-10-18T09:16:41Z
dc.date.created	2025
dc.date.issued	2025
dc.department	Fakülteler, Mühendislik Mimarlık ve Tasarım Fakültesi, Elektrik-Elektronik Mühendisliği Bölümü
dc.description	35th Edition of the Great Lakes Symposium on VLSI 2025, GLSVLSI 2025 -- New Orleans; LA; Jung Hotel New Orleans -- 212711
dc.description	ACM SIGDA; SanDisk; Texas Instruments
dc.description.abstract	Fully Homomorphic Encryption (FHE) enables computations on encrypted data, proving itself to be an essential building block for privacy-preserving applications. However, it involves computationally demanding operations such as polynomial multiplication, with the Number Theoretic Transform (NTT) being the state-of-the-art solution to perform it. Considering that most FHE schemes operate over the negacyclic ring of polynomials, we introduce a novel formulation of the hierarchical Four-Step NTT approach for the negacyclic ring, eliminating the need for pre- and post-processing steps found in the existing methods. To accelerate NTT operations, the Field-Programmable Gate Array (FPGA) devices offer flexible and powerful computing platforms. We propose an FPGA-based, high-speed, parametric and fully pipelined architecture that implements the improved Seven-Step NTT algorithm, which builds upon the four-step algorithm. Our design supports a wide range of parameters, including ring sizes up to 216 and modulus sizes up to 64-bit. We focus on achieving configurable throughput, as constrained by the bandwidth of High-Bandwidth Memory (HBM), which is an additional in-package memory common in high-end FGPA devices such as Alveo U280. We aim to maximize throughput through an IO parametric design on the Alveo U280 FPGA. The implementation results demonstrate that the average latency of our design for batch NTT operation is 8.32?s for the ring size 216 and 64-bit width; a speed-up of 7.96 × compared to the current state-of-the-art designs. © 2025 Elsevier B.V., All rights reserved.
dc.identifier.doi	10.1145/3716368.3735514
dc.identifier.endpage	21
dc.identifier.isbn	9798400714962
dc.identifier.scopus	2-s2.0-105017760898
dc.identifier.scopusquality	N/A
dc.identifier.startpage	14
dc.identifier.uri	https://doi.org/10.1145/3716368.3735514
dc.identifier.uri	https://hdl.handle.net/11772/19360
dc.identifier.wos	WOS:001596652300003
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.language.iso	en
dc.publisher	Association for Computing Machinery
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	Scopus_20251016
dc.subject	Fhe
dc.subject	Four-Step
dc.subject	Fpga
dc.subject	Fully-Pipelined
dc.subject	Hardware Acceleration
dc.subject	Negacyclic
dc.subject	Ntt
dc.subject	Seven-Step
dc.title	IO-Optimized Design-Time Configurable Negacyclic Seven-Step NTT Architecture for FHE Applications
dc.type	Conference Object
dspace.entity.type	Publication
relation.isAuthorOfPublication	2125e712-2c55-4f12-be22-eb1fc0fa7a1f
relation.isAuthorOfPublication.latestForDiscovery	2125e712-2c55-4f12-be22-eb1fc0fa7a1f

Koleksiyon

Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu

IO-Optimized Design-Time Configurable Negacyclic Seven-Step NTT Architecture for FHE Applications

Dosyalar

Koleksiyon