ML with HEIR

HEIR’s ML frontend, compilation pipeline, hardware integrations, and active research directions.

HEIR: Fully Homomorphic Machine Learning with a Universal Compiler

An FHE compiler toolchain and development platform without sacrificing generality and extensibility.

HEIR project asraa@google.com arXiv: 2508.11095

HEIR logo

HEIR provides an MLIR-based path from ML frontends to scheme-level IRs, library backends, and lower-level arithmetic intended for hardware integration.

ML Frontend

ML frontend flow from PyTorch, TensorFlow, and ONNX into HEIR's linalg entry level
PyTorch, TensorFlow, and ONNX converge into HEIR's linalg entry level through torch-mlir, onnx-mlir, and StableHLO.

Linalg Entry Level

Torch models are converted with torch-mlir to linalg on tensors (with tensor and arith dialects) as HEIR input.

The linalg dialect is a funnel dialect for HEIR's MLIR frontend. Its abstraction level is required for matching on ML kernel operations for optimization. Canonicalization patterns simplify and reduce memory shuffling operations and reduce non-linear operations at this level.

Compilation Configuration

  • Backend and scheme selection
  • Secret input data selected with annotations
  • Non-linear activation approximation degree
  • Range bounds from model metadata
  • User controlled kernel selection
module attributes {backend.openfhe, scheme.ckks} {
  func.func @mnist(%input: tensor<784> {secret.secret}) -> tensor<10> {
    %matrix1 = arith.constant dense<...> : tensor<512x784>
    %bias1 = arith.constant dense<...> : tensor<512>
    %matrix2 = arith.constant dense<...> : tensor<10x512>
    %bias2 = arith.constant dense<...> : tensor<10>
    %cst = arith.constant dense<0.0> : tensor<512>
    %0 = linalg.matvec ins(%matrix1, %input) {kernel = "diagonal"}
    %1 = arith.addf %0, %bias1
    %2 = arith.maximumf %1, %cst {degree = 6, lower = -15.0, upper = 12}
    %3 = linalg.matvec ins(%matrix2, %2)
    %4 = arith.addf %3, %bias2
    return %4
  }
}

ML Compilation Pipeline

Relation-based Ciphertext Layouts

Layouts are a partial function mapping from the index set of a cleartext tensor to the index set of a list of ciphertext slots using Presburger relations and quasi-affine formulas.

  • Fully general layout annotations describing plaintext-ciphertext relation
  • Polyhedral optimization with Integer Set Library analyzes and manipulates layouts, for e.g. to compute kernel simplifications or slot utilization for batching

One useful example maps an (i, j) index in an 8 x 8 tensor to eight ciphertexts with 1024 slots:

(i, j) ↦ (ct, slot)
(i − j + ct)  mod  8 = 0 (i − slot)  mod  1024 = 0 0 <= i, j, ct < 8 0 <= slot < 1024

Mapping (i, j) of an 8×8 tensor to 8 ciphertexts with 1024 slots

Orion convolution data layout showing input-filter convolution and resulting diagonal layout matrix

Diagram modified from Fig 3 of Orion: A Fully Homomorphic Encryption Framework for Deep Learning

mr = (idr + P)F + idc + P
mc = Wdidr + idc + Wdifr + ifc

Layout Optimization Flow

Propagate

Forward analysis propagates IR with default layouts and kernels.

Optimize

Cost models select optimal kernels to minimize cost and layout conversions.

Simplify

Backwards traversal hoists layout conversions to encodings.

New Layout Integrations

HEIR integrates bicyclic [8] and tricyclic [9] layouts and kernels to compute batched matrix multiplication for parallelized multi-head self-attention with optimal multiplicative depth.

Supported layouts and kernels are easily extended with ISL utilities and a testable MLIR-agnostic kernel library.

Optimization Variety Pack

HEIR's ML pipeline utilizes a number of generally applicable optimization patterns:

  • Sparse matrix product simplification
  • Baby-step giant-step for general reductions
  • Minimal depth polynomials evaluation with Paterson-Stockmeyer
  • Fast (hoisted) rotation rewrites
  • Minimized extended key basis switching
  • High level program vectorization
  • Shift networks for layout conversions
  • Loop support with HALO optimizations
  • Multiplexed data packing for slot utilization
Model Transforms
Arithmetization
Vectorization
Layout Pipeline
Noise Management
Parameter Selection
Plaintext Execution
Scheme IR
Transforms operate in the linalg dialect to secret arithmetic

Make It Easy

HEIR simplifies the developer and debugging experience with:

  • Tracking and debugging utilities from MLIR
  • Plaintext execution mode with custom debug handlers
  • Client helpers for encoding and encryption/decryption
  • Output code is human-readable code to support inspection and modification
  • Cleartext computations are hoisted to separate functions for precomputation
  • Scheme-specific parameter selection
// preprocessing functions
PlaintextT matvec__preprocessing(CryptoContextT cc) {
  ...
  const auto& pt2 = cc->MakeCKKSPackedPlaintext(c0);
  return pt2;
}

// main workload
CiphertextT matvec(CryptoContextT cc, CiphertextT ct) {
  ...
  const auto& ct5 = cc->EvalMult(ct4, pt2);
  const auto& ct6 = cc->EvalRotate(ct, 3);
  ...
  const auto& ct47 = cc->EvalAdd(ct38, ct46);
  const auto& ct48 = cc->EvalMultNoRelin(ct47, ct47);
  const auto& ct49 = cc->Relinearize(ct48);
  ...
}

// client functions
CiphertextT matvec__encrypt__arg0(
    CryptoContextT cc, std::vector<float> v0, PublicKeyT pk);
std::vector<float> matvec__decrypt__result0(
    CryptoContextT cc, CiphertextT ct, PrivateKeyT sk);
CryptoContextT matvec__generate_crypto_context();
CryptoContextT matvec__configure_crypto_context(
    CryptoContextT cc, PrivateKeyT sk);

Hardware Integrations

Python Torch TensorFlow Lite
Standard MLIR
func linalg tensor arith affine ...
Secret arithmetic
secret tensor_ext mgmt polynomial comb
Scheme APIs
lwe bgv ckks cggi
Scheme implementation
polynomial rns mod_arith
Hardware dialects
llvm scifr ...
Library APIs
lattigo tfhe_rust jaxite openfhe

Exit Dialects

Support for multiple backends (CPU, GPU, FPGA, ASICs, and photonics) allows for comprehensive testing and benchmarking. After HEIR's high level program analysis and compilation, data layouts, kernels, schemes, and parameters are selected and the IR uses scheme level operations. Scheme level IR is lowered in two possible ways to exit HEIR:

  1. Library dialects (e.g. Lattigo, OpenFHE, tfhe-rs) mirror APIs and are translated to code via HEIR's emitter. Allows fast prototyping and easy integration but limits the ability to perform fusion or other cross-operation optimizations.
  2. Low level IRs: scheme operations are implemented using polynomial and modular arithmetic dialects. Hardware specific toolchains handle further optimization, scheduling and assembly (e.g. the LLVM toolchain compiles the MLIR for CPU). This path is suitable for longer term, robust integrations.
Optalysys logo

Optalysys utilizes photonic computing technology to perform modular arithmetic operations over the Polynomial Modular Number System (PMNS). Integration with HEIR's generated low level NTT and mod arith code will allow running FHE workloads on Optalysys' optical processing chips.

optalysys.com/resource/optalysys-partners-with-google-heir

Belfort logo

Belfort integrates their FPGA-based accelerator with HEIR through the CGGI boolean and shortint APIs. They utilize vectorization strategies in HEIR and software optimizations in their custom tfhe-rs library for performance.

Ian Berkenstein, Milind et al. "BTS: An Accelerator for Bootstrappable Fully Homomorphic Encryption," in Proceedings of the 49th Annual International Symposium on Computer Architecture, ACM, 2022.

Cornami logo

Cornami's MX2 systolic array is integrated as a backend to HEIR's MLIR pipeline for CGGI and CKKS schemes. HEIR exits to Cornami's Secure Computing Interface Framework (SCIFR) with custom optimizations.

Custozimov, Denis et al. "Resource-Sensitive Integration of CGGI and CKKS schemes on the Cornami Computing Target," ArXiv, 2025.

CROSS

TPU-native CKKS implementation with SoTA performance vs GPU (20ms bootstrap) using JAX. HEIR integration utilizes the CKKS dialect to lower to the CROSS API exit dialect.

Fang, Jiangteng et al. "Leveraging ASIC AI Chips for Homomorphic Encryption," in IEEE International Symposium on High-Performance Computer Architecture, 2025.

FHE Tech logo

HEIR tracks progress of the polynomial intermediate representation (IR) developed by FHE Technical Consortium for Hardware (FHETCH). The IR aims to provide a standardised set of hardware-level operations for interoperable platform integration. HEIR's polynomial dialect aligns with the evolving standard.

The global FHE hardware consortium (www.fhetch.org)

Plus more backends in progress (e.g. FIDESlib GPU backend) and under NDA.

C. Aguilo-Domingo et al., "FIDESlib: A Fully-Fledged Open-Source FHE Library for Efficient CKKS on GPUs," in 2025 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Ghent, Belgium, 2025, pp. 1-3.

Community

HEIR's open-source framework supports major homomorphic encryption methods, enabling efficient research and benchmarking. Its architecture facilitates the integration of state of the art and emerging methodologies, as evidenced by various projects built with or incorporated into HEIR.

Call for Contributions

Connect with us to explore potential research directions and integrations, including:

  • Integrating Gentry-Lee FHE scheme "Fully Homomorphic Encryption for Matrix Arithmetic"
  • Layout optimizer that uses the structure of Presburger relations, and/or the general joint layout+kernel selection problem
  • New FHE scheme implementations (e.g. GBFV) and optimizations
  • Incorporating memory constraints into cost models for kernel compilation
  • Profile-guided optimizations for parameter selection & scale management

Fhelipe Layout Hoisting

HEIR uses a FHelipe's hoisting heuristic to minimize layout conversions between operations.

Average-Case Noise Analysis

HEIR was used to experimentally demonstrate underestimations of average-case noise analysis.

HALO Compiler Loop Support

HEIR adopts transforms from the HALO compiler for loop-aware bootstrapping placement.

ROTOM: Autovectorizing HE

ROTOM's tensor vectorization strategy is integrated as an option for layout optimization.

Orion Compiler Kernels

HEIR incorporates Orion's convolution data layout and kernel with double-hoisting and BSGS.

KeyMemRT Memory Scalability

Key memory management minimization strategies are incorporated into HEIR.

Tricycle: Private Transformers

HEIR supports tricyclic layouts to enable ciphertext matrix multiplications for self-attention.

Vos-Vos-Erkin Shift Networks

Efficient shift network implementation of layout conversions using graph coloring.

References

  1. E. Chen et al., Bridging Usability and Performance: A Tensor Compiler for Autovectorizing Homomorphic Encryption, IACR Cryptol. ePrint Arch., 2025/1319.
  2. Z. Zhou et al., Orbit: Optimizing Rescale and Bootstrap Placement with Integer Linear Programming Techniques for Secure Inference, Cryptology ePrint Archive, 2026/213.
  3. E. Ünay et al., KeyMemRT Compiler and Runtime: Unlocking Memory-Scalable FHE, arXiv:2601.18445, 2026.
  4. M. Gao and H. Zheng, A Critique on Average-Case Noise Analysis in RLWE-Based Homomorphic Encryption, Proceedings of the 13th Workshop on Encrypted Computing & Applied Homomorphic Computing, 2025.
  5. A. Krastev et al., A Tensor Compiler with Automatic Data Packing for Simple and Efficient Fully Homomorphic Encryption, Proceedings of the ACM on Programming Languages, vol. 8, no. PLDI, June 2024, pp. 126–50.
  6. S. Cheon et al., HALO: Loop-Aware Bootstrapping Management for Fully Homomorphic Encryption, Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, 2025, pp. 572–85.
  7. A. Ebel et al., Orion: A Fully Homomorphic Encryption Framework for Deep Learning, Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2025, pp. 734–49.
  8. J. Chen, L. Yang, W. Wu, Y. Liu, and Y. Feng, Homomorphic Matrix Operations Under Bicyclic Encoding, IEEE Transactions on Information Forensics and Security, vol. 20, 2025, pp. 1390–404.
  9. L. Lim et al., Tricycle: Private Transformer Inference with Tricyclic Encodings, Cryptology ePrint Archive, 2025/1200.
  10. J. Vos et al., Efficient Circuits for Permuting and Mapping Packed Values Across Leveled Homomorphic Ciphertexts, Computer Security – ESORICS 2022, Springer, 2022, pp. 408–23.