HEIR: Fully Homomorphic Machine Learning with a Universal Compiler

An FHE compiler toolchain and development platform without sacrificing generality and extensibility.

HEIR project asraa@google.com arXiv: 2508.11095

HEIR provides an MLIR-based path from ML frontends to scheme-level IRs, library backends, and lower-level arithmetic intended for hardware integration.

ML Frontend

Linalg Entry Level

Torch models are converted with torch-mlir to linalg on tensors (with tensor and arith dialects) as HEIR input.

The linalg dialect is a funnel dialect for HEIR's MLIR frontend. Its abstraction level is required for matching on ML kernel operations for optimization. Canonicalization patterns simplify and reduce memory shuffling operations and reduce non-linear operations at this level.

Compilation Configuration

Backend and scheme selection
Secret input data selected with annotations
Non-linear activation approximation degree
Range bounds from model metadata
User controlled kernel selection

module attributes {backend.openfhe, scheme.ckks} {
  func.func @mnist(%input: tensor<784> {secret.secret}) -> tensor<10> {
    %matrix1 = arith.constant dense<...> : tensor<512x784>
    %bias1 = arith.constant dense<...> : tensor<512>
    %matrix2 = arith.constant dense<...> : tensor<10x512>
    %bias2 = arith.constant dense<...> : tensor<10>
    %cst = arith.constant dense<0.0> : tensor<512>
    %0 = linalg.matvec ins(%matrix1, %input) {kernel = "diagonal"}
    %1 = arith.addf %0, %bias1
    %2 = arith.maximumf %1, %cst {degree = 6, lower = -15.0, upper = 12}
    %3 = linalg.matvec ins(%matrix2, %2)
    %4 = arith.addf %3, %bias2
    return %4
  }
}

ML Compilation Pipeline

Relation-based Ciphertext Layouts

Layouts are a partial function mapping from the index set of a cleartext tensor to the index set of a list of ciphertext slots using Presburger relations and quasi-affine formulas.

Fully general layout annotations describing plaintext-ciphertext relation
Polyhedral optimization with Integer Set Library analyzes and manipulates layouts, for e.g. to compute kernel simplifications or slot utilization for batching

One useful example maps an (i, j) index in an 8 x 8 tensor to eight ciphertexts with 1024 slots:

(i, j) ↦ (ct, slot)

(i − j + ct) mod 8 = 0 (i − slot) mod 1024 = 0 0 <= i, j, ct < 8 0 <= slot < 1024

Mapping (i, j) of an 8×8 tensor to 8 ciphertexts with 1024 slots

Orion convolution data layout showing input-filter convolution and resulting diagonal layout matrix — Diagram modified from Fig 3 of Orion: A Fully Homomorphic Encryption Framework for Deep Learning
m_r = (i_dr + P)F + i_dc + P
m_c = W_di_dr + i_dc + W_di_fr + i_fc

Layout Optimization Flow

Propagate

Forward analysis propagates IR with default layouts and kernels.

↓

Optimize

Cost models select optimal kernels to minimize cost and layout conversions.

↓

Simplify

Backwards traversal hoists layout conversions to encodings.

New Layout Integrations

HEIR integrates bicyclic [8] and tricyclic [9] layouts and kernels to compute batched matrix multiplication for parallelized multi-head self-attention with optimal multiplicative depth.

Supported layouts and kernels are easily extended with ISL utilities and a testable MLIR-agnostic kernel library.

Optimization Variety Pack

HEIR's ML pipeline utilizes a number of generally applicable optimization patterns:

Sparse matrix product simplification
Baby-step giant-step for general reductions
Minimal depth polynomials evaluation with Paterson-Stockmeyer
Fast (hoisted) rotation rewrites
Minimized extended key basis switching
High level program vectorization
Shift networks for layout conversions
Loop support with HALO optimizations
Multiplexed data packing for slot utilization

Model Transforms

↓

Arithmetization

↓

Vectorization

↓

Layout Pipeline

↓

Noise Management

↓

Parameter Selection

Plaintext Execution

Scheme IR

Transforms operate in the linalg dialect to secret arithmetic

Make It Easy

HEIR simplifies the developer and debugging experience with:

Tracking and debugging utilities from MLIR
Plaintext execution mode with custom debug handlers
Client helpers for encoding and encryption/decryption
Output code is human-readable code to support inspection and modification
Cleartext computations are hoisted to separate functions for precomputation
Scheme-specific parameter selection

// preprocessing functions
PlaintextT matvec__preprocessing(CryptoContextT cc) {
  ...
  const auto& pt2 = cc->MakeCKKSPackedPlaintext(c0);
  return pt2;
}

// main workload
CiphertextT matvec(CryptoContextT cc, CiphertextT ct) {
  ...
  const auto& ct5 = cc->EvalMult(ct4, pt2);
  const auto& ct6 = cc->EvalRotate(ct, 3);
  ...
  const auto& ct47 = cc->EvalAdd(ct38, ct46);
  const auto& ct48 = cc->EvalMultNoRelin(ct47, ct47);
  const auto& ct49 = cc->Relinearize(ct48);
  ...
}

// client functions
CiphertextT matvec__encrypt__arg0(
    CryptoContextT cc, std::vector<float> v0, PublicKeyT pk);
std::vector<float> matvec__decrypt__result0(
    CryptoContextT cc, CiphertextT ct, PrivateKeyT sk);
CryptoContextT matvec__generate_crypto_context();
CryptoContextT matvec__configure_crypto_context(
    CryptoContextT cc, PrivateKeyT sk);

Hardware Integrations

Python Torch TensorFlow Lite

Standard MLIR

func linalg tensor arith affine ...

Secret arithmetic

secret tensor_ext mgmt polynomial comb

Scheme APIs

lwe bgv ckks cggi

Scheme implementation

polynomial rns mod_arith

Hardware dialects

llvm scifr ...

Library APIs

lattigo tfhe_rust jaxite openfhe

Exit Dialects

Support for multiple backends (CPU, GPU, FPGA, ASICs, and photonics) allows for comprehensive testing and benchmarking. After HEIR's high level program analysis and compilation, data layouts, kernels, schemes, and parameters are selected and the IR uses scheme level operations. Scheme level IR is lowered in two possible ways to exit HEIR:

Library dialects (e.g. Lattigo, OpenFHE, tfhe-rs) mirror APIs and are translated to code via HEIR's emitter. Allows fast prototyping and easy integration but limits the ability to perform fusion or other cross-operation optimizations.
Low level IRs: scheme operations are implemented using polynomial and modular arithmetic dialects. Hardware specific toolchains handle further optimization, scheduling and assembly (e.g. the LLVM toolchain compiles the MLIR for CPU). This path is suitable for longer term, robust integrations.

Optalysys utilizes photonic computing technology to perform modular arithmetic operations over the Polynomial Modular Number System (PMNS). Integration with HEIR's generated low level NTT and mod arith code will allow running FHE workloads on Optalysys' optical processing chips.

optalysys.com/resource/optalysys-partners-with-google-heir

Belfort integrates their FPGA-based accelerator with HEIR through the CGGI boolean and shortint APIs. They utilize vectorization strategies in HEIR and software optimizations in their custom tfhe-rs library for performance.

Ian Berkenstein, Milind et al. "BTS: An Accelerator for Bootstrappable Fully Homomorphic Encryption," in Proceedings of the 49th Annual International Symposium on Computer Architecture, ACM, 2022.

Cornami's MX2 systolic array is integrated as a backend to HEIR's MLIR pipeline for CGGI and CKKS schemes. HEIR exits to Cornami's Secure Computing Interface Framework (SCIFR) with custom optimizations.

Custozimov, Denis et al. "Resource-Sensitive Integration of CGGI and CKKS schemes on the Cornami Computing Target," ArXiv, 2025.

CROSS

TPU-native CKKS implementation with SoTA performance vs GPU (20ms bootstrap) using JAX. HEIR integration utilizes the CKKS dialect to lower to the CROSS API exit dialect.

Fang, Jiangteng et al. "Leveraging ASIC AI Chips for Homomorphic Encryption," in IEEE International Symposium on High-Performance Computer Architecture, 2025.

HEIR tracks progress of the polynomial intermediate representation (IR) developed by FHE Technical Consortium for Hardware (FHETCH). The IR aims to provide a standardised set of hardware-level operations for interoperable platform integration. HEIR's polynomial dialect aligns with the evolving standard.

The global FHE hardware consortium (www.fhetch.org)

Plus more backends in progress (e.g. FIDESlib GPU backend) and under NDA.

C. Aguilo-Domingo et al., "FIDESlib: A Fully-Fledged Open-Source FHE Library for Efficient CKKS on GPUs," in 2025 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Ghent, Belgium, 2025, pp. 1-3.

Community

HEIR's open-source framework supports major homomorphic encryption methods, enabling efficient research and benchmarking. Its architecture facilitates the integration of state of the art and emerging methodologies, as evidenced by various projects built with or incorporated into HEIR.

Call for Contributions

Connect with us to explore potential research directions and integrations, including:

Integrating Gentry-Lee FHE scheme "Fully Homomorphic Encryption for Matrix Arithmetic"
Layout optimizer that uses the structure of Presburger relations, and/or the general joint layout+kernel selection problem
New FHE scheme implementations (e.g. GBFV) and optimizations
Incorporating memory constraints into cost models for kernel compilation
Profile-guided optimizations for parameter selection & scale management

Fhelipe Layout Hoisting

HEIR uses a FHelipe's hoisting heuristic to minimize layout conversions between operations.

Average-Case Noise Analysis

HEIR was used to experimentally demonstrate underestimations of average-case noise analysis.

HALO Compiler Loop Support

HEIR adopts transforms from the HALO compiler for loop-aware bootstrapping placement.

ROTOM: Autovectorizing HE

ROTOM's tensor vectorization strategy is integrated as an option for layout optimization.

Orion Compiler Kernels

HEIR incorporates Orion's convolution data layout and kernel with double-hoisting and BSGS.

KeyMemRT Memory Scalability

Key memory management minimization strategies are incorporated into HEIR.

Tricycle: Private Transformers

HEIR supports tricyclic layouts to enable ciphertext matrix multiplications for self-attention.

Vos-Vos-Erkin Shift Networks

Efficient shift network implementation of layout conversions using graph coloring.

References

E. Chen et al., Bridging Usability and Performance: A Tensor Compiler for Autovectorizing Homomorphic Encryption, IACR Cryptol. ePrint Arch., 2025/1319.
Z. Zhou et al., Orbit: Optimizing Rescale and Bootstrap Placement with Integer Linear Programming Techniques for Secure Inference, Cryptology ePrint Archive, 2026/213.
E. Ünay et al., KeyMemRT Compiler and Runtime: Unlocking Memory-Scalable FHE, arXiv:2601.18445, 2026.
M. Gao and H. Zheng, A Critique on Average-Case Noise Analysis in RLWE-Based Homomorphic Encryption, Proceedings of the 13th Workshop on Encrypted Computing & Applied Homomorphic Computing, 2025.
A. Krastev et al., A Tensor Compiler with Automatic Data Packing for Simple and Efficient Fully Homomorphic Encryption, Proceedings of the ACM on Programming Languages, vol. 8, no. PLDI, June 2024, pp. 126–50.
S. Cheon et al., HALO: Loop-Aware Bootstrapping Management for Fully Homomorphic Encryption, Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, 2025, pp. 572–85.
A. Ebel et al., Orion: A Fully Homomorphic Encryption Framework for Deep Learning, Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2025, pp. 734–49.
J. Chen, L. Yang, W. Wu, Y. Liu, and Y. Feng, Homomorphic Matrix Operations Under Bicyclic Encoding, IEEE Transactions on Information Forensics and Security, vol. 20, 2025, pp. 1390–404.
L. Lim et al., Tricycle: Private Transformer Inference with Tricyclic Encodings, Cryptology ePrint Archive, 2025/1200.
J. Vos et al., Efficient Circuits for Permuting and Mapping Packed Values Across Leveled Homomorphic Ciphertexts, Computer Security – ESORICS 2022, Springer, 2022, pp. 408–23.

ML with HEIR