This section contains the passes defined by HEIR.
This is the multi-page printable view of this section. Click here to print.
Passes
- 1: ApplyFoldersPasses
- 2: BGVPasses
- 3: BGVToOpenfhe
- 4: BGVToPolynomial
- 5: CGGIPasses
- 6: CGGIToTfheRust
- 7: CGGIToTfheRustBool
- 8: CombToCGGI
- 9: ElementwiseToAffinePasses
- 10: ForwardStoreToLoadPasses
- 11: FullLoopUnrollPasses
- 12: LWEPasses
- 13: MemrefToArith
- 14: PolynomialToStandard
- 15: SecretizePasses
- 16: SecretPasses
- 17: SecretToBGV
- 18: StraightLineVectorizerPasses
- 19: TensorExtPasses
- 20: UnusedMemRefPasses
- 21: YosysOptimizerPasses
1 - ApplyFoldersPasses
-apply-folders
Apply all folding patterns from canonicalize
This pass applies all registered folding patterns greedily to the input IR. This is useful when running a full canonicalize is too slow, but applying folders before canonicalize is sufficient to simplify the IR for later passes, or even sufficient to then subsequently run a full canonicalize pass.
This is used to prepare an IR for insert-rotate
after fully unrolling
loops.
2 - BGVPasses
-bgv-add-client-interface
Add client interfaces to BGV encrypted functions
This pass adds encrypt and decrypt functions for each compiled function in the IR. These functions maintain the same interface as the original function, while the compiled function may lose some of this information by the lowerings to ciphertext types (e.g., a scalar ciphertext, when lowered through BGV, must be encoded as a tensor).
Example:
For an input function with signature
#encoding = ...
#params = ...
!in_ty = !lwe.rlwe_ciphertext<encoding = #encoding, rlwe_params = #params, underlying_type = tensor<32xi16>>
!out_ty = !lwe.rlwe_ciphertext<encoding = #encoding, rlwe_params = #params, underlying_type = i16>
func.func @my_func(%arg0: !in_ty) -> !out_ty {
...
}
The pass will generate two new functions with signatures
func.func @my_func__encrypt(
%arg0: tensor<32xi16>,
%sk: !lwe.rlwe_secret_key<...>
) -> !in_ty
func.func @my_func__decrypt(
%arg0: !out_ty,
%sk: !lwe.rlwe_secret_key<...>
) -> i16
The my_func__encrypt
function has the same order of operands as my_func
,
and uses their underylying_type
as the corresponding input type.
The last operand is the encryption key.
The same holds for my_func__decrypt
, but the inputs are the return types
of my_func
and the results are the underlying types of the return types of my_func
.
If use-public-key
is set to true, the encrypt function uses
lwe.rlwe_public_key
for encryption.
If one-value-per-helper-fn
is set to true, the encryption helpers are split
into separate functions, one for each SSA value being converted. For example,
using the same !in_ty
and !out_ty
as above, this function signature
func.func @my_func(%arg0: !in_ty, %arg1: !in_ty) -> (!out_ty, !out_ty)
generates the following four helpers.
func.func @my_func__encrypt__arg0(%arg0: tensor<32xi16>, %sk: !lwe.rlwe_secret_key<...>) -> !in_ty
func.func @my_func__encrypt__arg1(%arg1: tensor<32xi16>, %sk: !lwe.rlwe_secret_key<...>) -> !in_ty
func.func @my_func__decrypt__result0(%arg0: !out_ty, %sk: !lwe.rlwe_secret_key<...>) -> i16
func.func @my_func__decrypt__result1(%arg1: !out_ty, %sk: !lwe.rlwe_secret_key<...>) -> i16
}
The suffix __argN
indicates the SSA value being encrypted is the N-th argument of my_func
,
and similarly for __resultN
.
Options
-use-public-key : If true, generate a client interface that uses a public key for encryption.
-one-value-per-helper-fn : If true, split encryption helpers into separate functions for each SSA value.
3 - BGVToOpenfhe
-bgv-to-openfhe
Lower bgv
to openfhe
dialect.
This pass lowers the bgv
dialect to Openfhe
dialect.
4 - BGVToPolynomial
-bgv-to-polynomial
Lower bgv
to polynomial
dialect.
This pass lowers the bgv
dialect to polynomial
dialect.
5 - CGGIPasses
-cggi-set-default-parameters
Set default parameters for CGGI ops
This pass adds default parameters to all CGGI ops as cggi_params
named
attributes, overriding any existing attribute set with that name.
This pass is primarily for testing purposes, and as a parameter provider before a proper parameter selection mechanism is added. This pass should not be used in production.
The specific parameters are hard-coded in
lib/Dialect/CGGI/Transforms/SetDefaultParameters.cpp
.
6 - CGGIToTfheRust
-cggi-to-tfhe-rust
Lower cggi
to tfhe_rust
dialect.
7 - CGGIToTfheRustBool
-cggi-to-tfhe-rust-bool
Lower cggi
to tfhe_rust_bool
dialect.
8 - CombToCGGI
-comb-to-cggi
Lower comb
to cggi
dialect.
This pass lowers the comb
dialect to cggi
dialect.
9 - ElementwiseToAffinePasses
-convert-elementwise-to-affine
This pass lowers ElementwiseMappable operations to Affine loops.
This pass lowers ElementwiseMappable operations over tensors to affine loop nests that instead apply the operation to the underlying scalar values.
10 - ForwardStoreToLoadPasses
-forward-store-to-load
Forward stores to loads within a single block
This pass is a simplified version of mem2reg and similar passes. It analyzes an operation, finding all basic blocks within that op that have memrefs whose stores can be forwarded to loads.
Does not support complex control flow within a block, nor ops with arbitrary subregions.
11 - FullLoopUnrollPasses
-full-loop-unroll
Fully unroll all loops
Scan the IR for affine.for loops and unroll them all.
12 - LWEPasses
-lwe-set-default-parameters
Set default parameters for LWE ops
This pass adds default parameters to all lwe
types as the lwe_params
attribute, and for lwe
ops as the params
attribute, overriding any
existing attributes set with those names.
This pass is primarily for testing purposes, and as a parameter provider before a proper parameter selection mechanism is added. This pass should not be used in production.
The specific parameters are hard-coded in
lib/Dialect/LWE/Transforms/SetDefaultParameters.cpp
.
13 - MemrefToArith
-expand-copy
Expands memref.copy ops to explicit affine loads and stores
This pass removes memref copy operations by expanding them to affine loads and stores. This pass introduces affine loops over the dimensions of the MemRef, so must be run prior to any affine loop unrolling in a pipeline.
Input
module {
func.func @memref_copy() {
%alloc = memref.alloc() : memref<2x3xi32>
%alloc_0 = memref.alloc() : memref<2x3xi32>
memref.copy %alloc, %alloc_0 : memref<1x1xi32> to memref<1x1xi32>
}
}
Output
module {
func.func @memref_copy() {
%alloc = memref.alloc() : memref<2x3xi32>
%alloc_0 = memref.alloc() : memref<2x3xi32>
affine.for %arg0 = 0 to 2 {
affine.for %arg1 = 0 to 3 {
%1 = affine.load %alloc[%arg0, %arg1] : memref<2x3xi32>
affine.store %1, %alloc_0[%arg0, %arg1] : memref<2x3xi32>
}
}
}
}
When --disable-affine-loop=true
is set, then the output becomes
module {
func.func @memref_copy() {
%alloc = memref.alloc() : memref<2x3xi32>
%alloc_0 = memref.alloc() : memref<2x3xi32>
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%0 = affine.load %alloc[%c0, %c0] : memref<2x3xi32>
affine.store %0, %alloc_0[%c0, %c0] : memref<2x3xi32>
%1 = affine.load %alloc[%c0, %c1] : memref<2x3xi32>
affine.store %1, %alloc_0[%c0, %c1] : memref<2x3xi32>
%2 = affine.load %alloc[%c0, %c2] : memref<2x3xi32>
affine.store %2, %alloc_0[%c0, %c2] : memref<2x3xi32>
[...]
}
}
Options
-disable-affine-loop : Use this to control to disable using affine loops
-extract-loop-body
Extracts logic of a loop bodies into functions.
This pass extracts logic in the inner body of for loops into functions.
This pass requires that tensors are lowered to memref. It expects that a loop body contains a number of affine.load statements used as inputs to the extracted function, and a single affine.store used as the extracted function’s output.
Input
module {
func.func @loop_body() {
%c-128_i8 = arith.constant -128 : i8
%c127_i8 = arith.constant 127 : i8
%alloc_7 = memref.alloc() {alignment = 64 : i64} : memref<25x20x8xi8>
affine.for %arg1 = 0 to 25 {
affine.for %arg2 = 0 to 20 {
affine.for %arg3 = 0 to 8 {
%98 = affine.load %alloc_6[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
%99 = arith.cmpi slt, %arg0, %c-128_i8 : i8
%100 = arith.select %99, %c-128_i8, %arg0 : i8
%101 = arith.cmpi sgt, %arg0, %c127_i8 : i8
%102 = arith.select %101, %c127_i8, %100 : i8
affine.store %102, %alloc_7[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
}
}
}
}
}
Output
module {
func.func @loop_body() {
%alloc_7 = memref.alloc() {alignment = 64 : i64} : memref<25x20x8xi8>
affine.for %arg1 = 0 to 25 {
affine.for %arg2 = 0 to 20 {
affine.for %arg3 = 0 to 8 {
%98 = affine.load %alloc_6[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
%102 = func.call @__for_loop(%98) : (i8) -> i8
affine.store %102, %alloc_7[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
}
}
}
}
func.func private @__for_loop(%arg0: i8) -> i8 {
%c-128_i8 = arith.constant -128 : i8
%c127_i8 = arith.constant 127 : i8
%99 = arith.cmpi slt, %arg0, %c-128_i8 : i8
%100 = arith.select %99, %c-128_i8, %arg0 : i8
%101 = arith.cmpi sgt, %arg0, %c127_i8 : i8
%102 = arith.select %101, %c127_i8, %100 : i8
return %102 : i8
}
}
Options
-min-loop-size : Use this to control the minimum loop size to apply this pass
-min-body-size : Use this to control the minimum loop body size to apply this pass
-memref-global-replace
MemrefGlobalReplacePass forwards global memrefs accessors to arithmetic values
This pass forwards constant global MemRef values to referencing affine loads. This pass requires that the MemRef global values are initialized as constants and that the affine load access indices are constants (i.e. not variadic). Unroll affine loops prior to running this pass.
MemRef removal is required to remove any memory allocations from the input model (for example, TensorFlow models contain global memory holding model weights) to support FHE transpilation.
Input
module {
memref.global "private" constant @__constant_8xi16 : memref<2x4xi16> = dense<[[-10, 20, 3, 4], [5, 6, 7, 8]]>
func.func @main() -> i16 {
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%0 = memref.get_global @__constant_8xi16 : memref<2x4xi16>
%1 = affine.load %0[%c1, %c1 + %c2] : memref<2x4xi16>
return %1 : i16
}
}
Output
module {
func.func @main() -> i16 {
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%c8_i16 = arith.constant 8 : i16
return %c8_i16 : i16
}
}
-unroll-and-forward
Loop unrolls and forwards stores to loads.
This pass processes the first function in a given module, and, starting from the first loop, iteratively does the following:
- Fully unroll the loop.
- Scan for load ops. For each load op with a statically-inferrable access index:
- Backtrack to the original memref alloc
- Find all store ops at the corresponding index (possibly transitively through renames/subviews of the underlying alloc).
- Find the last store that occurs and forward it to the load.
- If the original memref is an input memref, then forward through any renames to make the target load load directly from the argument memref (instead of any subviews, say)
- Apply the same logic to any remaining loads not inside any for loop.
This pass requires that tensors are lowered to memref, and only supports affine loops with affine.load/store ops.
Memrefs that result from memref.get_global ops are excluded from forwarding, even if they are loaded with a static index, and are instead handled by memref-global-replace, which should be run after this pass.
14 - PolynomialToStandard
-polynomial-to-standard
Lower polynomial
to standard MLIR dialects.
This pass lowers the polynomial
dialect to standard MLIR, a mixture of
affine, tensor, and arith.
15 - SecretizePasses
-secretize
Adds secret argument attributes to entry function
Adds a secret.secret attribute argument to each argument in the entry
function of an MLIR module. By default, the function is main
. This may be
overridden with the option -entry-function=top_level_func.
Options
-entry-function : entry function of the module
-wrap-generic
Wraps regions using secret args in secret.generic bodies
This pass wraps function regions of func.func
that use secret arguments in
secret.generic
bodies.
Secret arguments are annotated using a secret.secret
argument attribute.
This pass converts these to secret types and then inserts a secret.generic
body to hold the functions region. The output type is also converted to a
secret.
Example input:
func.func @main(%arg0: i32 {secret.secret}) -> i32 {
%0 = arith.constant 100 : i32
%1 = arith.addi %0, %arg0 : i32
return %1 : i32
}
Output:
func.func @main(%arg0: !secret.secret<i32>) -> !secret.secret<i32> {
%0 = secret.generic ins(%arg0 : !secret.secret<i32>) {
^bb0(%arg1: i32):
%1 = arith.constant 100 : i32
%2 = arith.addi %0, %arg1 : i32
secret.yield %2 : i32
} -> !secret.secret<i32>
return %0 : !secret.secret<i32>
}
16 - SecretPasses
-secret-capture-generic-ambient-scope
Capture the ambient scope used in a secret.generic
For each value used in the body of a secret.generic
op, which is defined
in the ambient scope outside the generic
, add it to the argument list of
the generic
.
-secret-distribute-generic
Distribute generic
ops through their bodies.
Converts generic
ops whose region contains many ops into smaller
sequences of generic ops whose regions contain a single op, dropping the
generic
part from any resulting generic
ops that have no
secret.secret
inputs. If the op has associated regions, and the operands
are not secret, then the generic is distributed recursively through the
op’s regions as well.
This pass is intended to be used as part of a front-end pipeline, where a
program that operates on a secret type annotates the input to a region as
secret
, and then wraps the contents of the region in a single large
secret.generic
, then uses this pass to simplify it.
The distribute-through
option allows one to specify a comma-separated
list of op names (e.g., distribute-thorugh="affine.for,scf.if"
), which
limits the distribution to only pass through those ops. If unset, all ops
are distributed through when possible.
Options
-distribute-through : comma-separated list of ops that should be distributed through
-secret-extract-generic-body
Extract the bodies of all generic ops into functions
This pass extracts the body of all generic ops into functions, and replaces the generic bodies with call ops. Used as a sub-operation in some passes, and extracted into its own pass for testing purposes.
This pass works best when --secret-generic-absorb-constants
is run
before it so that the extracted function contains any constants used
in the generic op’s body.
-secret-forget-secrets
Convert secret types to standard types
Drop the secret<...>
type from the IR, replacing it with the contained
type and the corresponding cleartext computation.
-secret-generic-absorb-constants
Copy constants into a secret.generic body
For each constant value used in the body of a secret.generic
op, which is
defined in the ambient scope outside the generic
, add it’s definition into
the generic
body.
-secret-merge-adjacent-generics
Merge two adjacent generics into a single generic
This pass merges two immedaitely sequential generics into a single generic. Useful as a sub-operation in some passes, and extracted into its own pass for testing purposes.
17 - SecretToBGV
-secret-to-bgv
Lower secret
to bgv
dialect.
This pass lowers an IR with secret.generic
blocks containing arithmetic
operations to operations on ciphertexts with the BGV dialect.
The pass assumes that the secret.generic
regions have been distributed
through arithmetic operations so that only one ciphertext operation appears
per generic block. It also requires that canonicalize
was run so that
non-secret values used are removed from the secret.generic
’s block
arguments.
The pass requires that all types are tensors of a uniform shape matching the
dimension of the ciphertext space specified my poly-mod-degree
.
Options
-poly-mod-degree : Default degree of the cyclotomic polynomial modulus to use for ciphertext space.
-coefficient-mod-bits : Default number of bits of the prime coefficient modulus to use for the ciphertext space.
18 - StraightLineVectorizerPasses
-straight-line-vectorize
A vectorizer for straight line programs.
This pass ignores control flow and only vectorizes straight-line programs within a given region.
Options
-dialect : Use this to restrict the dialect whose ops should be vectorized.
19 - TensorExtPasses
-collapse-insertion-chains
Collapse chains of extract/insert ops into rotate ops when possible
This pass is a cleanup pass for insert-rotate
. That pass sometimes leaves
behind a chain of insertion operations like this:
%extracted = tensor.extract %14[%c5] : tensor<16xi16>
%inserted = tensor.insert %extracted into %dest[%c0] : tensor<16xi16>
%extracted_0 = tensor.extract %14[%c6] : tensor<16xi16>
%inserted_1 = tensor.insert %extracted_0 into %inserted[%c1] : tensor<16xi16>
%extracted_2 = tensor.extract %14[%c7] : tensor<16xi16>
%inserted_3 = tensor.insert %extracted_2 into %inserted_1[%c2] : tensor<16xi16>
...
%extracted_28 = tensor.extract %14[%c4] : tensor<16xi16>
%inserted_29 = tensor.insert %extracted_28 into %inserted_27[%c15] : tensor<16xi16>
yield %inserted_29 : tensor<16xi16>
In many cases, this chain will insert into every index of the dest
tensor,
and the extracted values all come from consistently aligned indices of the same
source tensor. In this case, the chain can be collapsed into a single rotate
.
Each index used for insertion or extraction must be constant; this may
require running --canonicalize
or --sccp
before this pass to apply
folding rules (use --sccp
if you need to fold constant through control flow).
-insert-rotate
Vectorize arithmetic FHE operations using HECO-style heuristics
This pass implements the SIMD-vectorization passes from the HECO paper.
The pass operates by identifying arithmetic operations that can be suitably combined into a combination of cyclic rotations and vectorized operations on tensors. It further identifies a suitable “slot target” for each operation and heuristically aligns the operations to reduce unnecessary rotations.
This pass by itself does not eliminate any operations, but instead inserts
well-chosen rotations so that, for well-structured code (like unrolled affine loops),
a subsequent --cse
and --canonicalize
pass will dramatically reduce the IR.
As such, the pass is designed to be paired with the canonicalization patterns
in tensor_ext
, as well as the collapse-insertion-chains
pass, which
cleans up remaining insertion and extraction ops after the main simplifications
are applied.
Unlike HECO, this pass operates on plaintext types and tensors, along with
the HEIR-specific tensor_ext
dialect for its cyclic rotate
op. It is intended
to be run before lowering to a scheme dialect like bgv
.
-rotate-and-reduce
Use a logarithmic number of rotations to reduce a tensor.
This pass identifies when a commutative, associative binary operation is used to reduce all of the entries of a tensor to a single value, and optimizes the operations by using a logarithmic number of reduction operations.
In particular, this pass identifies an unrolled set of operations of the form (the binary ops may come in any order):
%0 = tensor.extract %t[0] : tensor<8xi32>
%1 = tensor.extract %t[1] : tensor<8xi32>
%2 = tensor.extract %t[2] : tensor<8xi32>
%3 = tensor.extract %t[3] : tensor<8xi32>
%4 = tensor.extract %t[4] : tensor<8xi32>
%5 = tensor.extract %t[5] : tensor<8xi32>
%6 = tensor.extract %t[6] : tensor<8xi32>
%7 = tensor.extract %t[7] : tensor<8xi32>
%8 = arith.addi %0, %1 : i32
%9 = arith.addi %8, %2 : i32
%10 = arith.addi %9, %3 : i32
%11 = arith.addi %10, %4 : i32
%12 = arith.addi %11, %5 : i32
%13 = arith.addi %12, %6 : i32
%14 = arith.addi %13, %7 : i32
and replaces it with a logarithmic number of rotate
and addi
operations:
%0 = tensor_ext.rotate %t, 4 : tensor<8xi32>
%1 = arith.addi %t, %0 : tensor<8xi32>
%2 = tensor_ext.rotate %1, 2 : tensor<8xi32>
%3 = arith.addi %1, %2 : tensor<8xi32>
%4 = tensor_ext.rotate %3, 1 : tensor<8xi32>
%5 = arith.addi %3, %4 : tensor<8xi32>
20 - UnusedMemRefPasses
-remove-unused-memref
Cleanup any unused memrefs
Scan the IR for unused memrefs and remove them.
This pass looks for locally allocated memrefs that are never used and deletes them. This pass can be used as a cleanup pass from other IR simplifications that forward stores to loads.
21 - YosysOptimizerPasses
-yosys-optimizer
Invoke Yosys to perform circuit optimization.
This pass invokes Yosys to convert an arithmetic circuit to an optimized boolean circuit that uses the arith and comb dialects.
Note that booleanization changes the function signature: multi-bit integers
are transformed to a tensor of booleans, for example, an i8
is converted
to tensor<8xi1>
.
The optimizer will be applied to each secret.generic
op containing
arithmetic ops that can be optimized.
Optional parameters:
abc-fast
: Run the abc optimizer in “fast” mode, getting faster compile time at the expense of a possibly larger output circuit.unroll-factor
: Before optimizing the circuit, unroll loops by a given factor. If unset, this pass will not unroll any loops.print-stats
: Prints statistics about the optimized circuits.mode={Boolean,LUT}
: Map gates to boolean gates or lookup table gates.
Statistics
total circuit size : The total circuit size for all optimized circuits, after optimization is done.