Passes
-align-tensor-sizes
Resize tensors into tensors with a fixed size final dimension
This pass resizes input tensors with arbitrary sizes into
tensors with whose final dimensions has a fixed size. All input tensors are
required to be one-dimensional. The --size
option specifies the size of the
final dimension of the output tensors, and is required to be a power of two.
To align the tensors in the input IR, the pass first zero pads the input to
the nearest power of two before replicating or splitting it into the output
shape determined by size
. The resulting transformation is described in a
SIMDPackingAttr
encoding attribute on the final tensor.
For example, with size=16
,a tensor with 7 elements will be zero-padded to 8
elements, and then replicated twice to fill a tensor with size 16. The
SIMDPackingAttr
will encode the input shape, the number of elements that
were zero-padded, and the output shape.
Input:
%0 = tensor.empty() : tensor<7xi32>
Output:
%0 = tensor.empty() : tensor<16xi32, #tensor_ext.simd_packing<in = [7], padding = [1], out = [16]>>
A tensor with 30 elements will be zero padded with 2 elements and split into two tensors of size 16.
Input:
%0 = tensor.empty() : tensor<30xi32>
Output:
%0 = tensor.empty() : tensor<2x16xi32, #tensor_ext.simd_packing<in = [30], padding = [2], out = [16]>>
Note that this pass does not insert any new operations like reshape
, but
rather transforms the IR to use tensors with a fixed dimension. This pass may
be used to align the sizes of tensors that represent plaintexts and
ciphertexts in RLWE schemes that support SIMD slots and operations.
Options
-size : Power of two size of the final dimension of the output tensors.
-annotate-mgmt
Annotate MgmtAttr for secret SSA values in the IR
This pass runs the secretness/level/dimension analysis and annotates the IR with the results,
saving it into each op’s attribute dictionary as mgmt.mgmt :
-annotate-secretness
Annotate secret SSA values in the IR
This pass runs the secretness analysis and annotates the IR with the results,
saving it into each op’s attribute dictionary as secret :
-apply-folders
Apply all folding patterns from canonicalize
This pass applies all registered folding patterns greedily to the input IR. This is useful when running a full canonicalize is too slow, but applying folders before canonicalize is sufficient to simplify the IR for later passes, or even sufficient to then subsequently run a full canonicalize pass.
This is used to prepare an IR for insert-rotate
after fully unrolling
loops.
-arith-to-cggi-quart
Lower arith
to cggi
dialect and divide each operation into smaller parts.
This pass converts high precision arithmetic operations, i.e. operations on 32 bit integer,
into a sequence of lower precision operations, i.e 8b operations.
Currently, the pass splits the 32b integer into four 8b integers, using the tensor dialect.
These smaller integers are stored in an 16b integer, so that we don’t lose the carry information.
This pass converts the arith
dialect to the cggi
dialect.
Based on the arith-emulate-wide-int
pass from the MLIR arith dialect.
General assumption: the first element in the tensor is also the LSB element.
-arith-to-cggi
Lower arith
to cggi
dialect.
-bgv-to-lattigo
Lower bgv
to lattigo
dialect.
This pass lowers the bgv
dialect to Lattigo
dialect.
-bgv-to-lwe
Lower bgv
to lwe
dialect.
This pass lowers the bgv
dialect to lwe
dialect.
Note that some scheme specific ops (e.g., modswitch) that
have no direct analogue in the lwe
dialect are left unchanged.
TODO (#1193): support both “common” and “full” lwe lowering
-cggi-boolean-vectorize
Group different logic gates with the packed API
This pass groups independent logic gates into a single call of the packed operations. Pass is based on the straight-line-vectorizer, but is fundamentally different. This pass combines any type of boolean gates and is not restricted to combining the same type of gate operand.
Pass is intended for the FPT
tfhe-rs API, where packed_gates
function get a
the boolean gates are passed as a string vector and a left and right vector of ciphertexts.
Each boolean gates specified in gates
is then applied element wise.
let outputs_ct = fpga_key.packed_gates(&gates, &ref_to_ct_lefts, &ref_to_ct_rights);
Options
-parallelism : Parallelism factor for batching. 0 is infinite parallelism
-cggi-expand-lut
Expands LUTs into LWE operations and programmable bootstraps
This pass expands the linear combination performed in a LUT operation into the component LWE scalar operations and a programmable bootstrap operation.
For example, a LUT3 operation is composed of three LWE ciphertext inputs $c, b, a$ (in MSB to LSB ordering) which must be combined via the linear combination $4 * c + 2 * b + a$ before being fed into a programmable bootstrap defined by the lookup table.
This pass supports LUT2, LUT3, and LutLincomb operations.
-cggi-set-default-parameters
Set default parameters for CGGI ops
This pass adds default parameters to all CGGI ops as cggi_params
named
attributes, overriding any existing attribute set with that name.
This pass is primarily for testing purposes, and as a parameter provider before a proper parameter selection mechanism is added. This pass should not be used in production.
The specific parameters are hard-coded in
lib/Dialect/CGGI/Transforms/SetDefaultParameters.cpp
.
-cggi-to-jaxite
Lower cggi
to jaxite
dialect.
-cggi-to-tfhe-rust-bool
Lower cggi
to tfhe_rust_bool
dialect.
-cggi-to-tfhe-rust
Lower cggi
to tfhe_rust
dialect.
-ckks-to-lwe
Lower ckks
to lwe
dialect.
This pass lowers the ckks
dialect to lwe
dialect.
Note that some scheme specific ops (e.g., rescale) that
have no direct analogue in the lwe
dialect are left unchanged.
TODO (#1193): support both “common” and “full” lwe lowering
-collapse-insertion-chains
Collapse chains of extract/insert ops into rotate ops when possible
This pass is a cleanup pass for insert-rotate
. That pass sometimes leaves
behind a chain of insertion operations like this:
%extracted = tensor.extract %14[%c5] : tensor<16xi16>
%inserted = tensor.insert %extracted into %dest[%c0] : tensor<16xi16>
%extracted_0 = tensor.extract %14[%c6] : tensor<16xi16>
%inserted_1 = tensor.insert %extracted_0 into %inserted[%c1] : tensor<16xi16>
%extracted_2 = tensor.extract %14[%c7] : tensor<16xi16>
%inserted_3 = tensor.insert %extracted_2 into %inserted_1[%c2] : tensor<16xi16>
...
%extracted_28 = tensor.extract %14[%c4] : tensor<16xi16>
%inserted_29 = tensor.insert %extracted_28 into %inserted_27[%c15] : tensor<16xi16>
yield %inserted_29 : tensor<16xi16>
In many cases, this chain will insert into every index of the dest
tensor,
and the extracted values all come from consistently aligned indices of the same
source tensor. In this case, the chain can be collapsed into a single rotate
.
Each index used for insertion or extraction must be constant; this may
require running --canonicalize
or --sccp
before this pass to apply
folding rules (use --sccp
if you need to fold constant through control flow).
-convert-elementwise-to-affine
This pass lowers ElementwiseMappable operations to Affine loops.
This pass lowers ElementwiseMappable operations over tensors to affine loop nests that instead apply the operation to the underlying scalar values.
Usage: ‘–convert-elementwise-to-affine=convert-ops=arith.mulf ' restrict conversion to mulf op from arith dialect.
‘–convert-elementwise-to-affine=convert-ops=arith.addf,arith.divf convert-dialects=bgv’ restrict conversion to addf and divf ops from arith dialect and all of the ops in bgv dialect.
–convert-elementwise-to-affine=convert-dialects=arith restrict conversion to arith dialect so ops only from arith dialect is processed.
–convert-elementwise-to-affine=convert-ops=arith.addf,arith.mulf restrict conversion only to these two ops - addf and mulf - from arith dialect.
Options
-convert-ops : comma-separated list of ops to run this pass on
-convert-dialects : comma-separated list of dialects to run this pass on
-convert-if-to-select
Convert scf.if operations on secret conditions to arith.select operations.
Conversion for If-operations that evaluate secret condition to alternative select operations.
-convert-polynomial-mul-to-ntt
Rewrites polynomial operations to their NTT equivalents
Applies a rewrite pattern to convert polynomial multiplication to the equivalent using the number-theoretic transforms (NTT) when possible.
Polynomial multiplication can be rewritten as polynomial.NTT on each operand, followed by modulo elementwise multiplication of the point-value representation and then the inverse-NTT back to coefficient representation.
-convert-secret-extract-to-static-extract
Convert tensor.extract
operations on secret index to static extract operations.
Converts tensor.extract
operations that read value at secret index to alternative static tensor.extract
operations that extracts value at each index and conditionally selects the value extracted at the secret index.
Note: Running this pass alone does not result in a data-oblivious program; we have to run the --convert-if-to-select
pass to the resulting program to convert the secret-dependent If-operation to a Select-operation.
Example input:
mlir func.func @main(%secretTensor: !secret.secret<tensor<32xi16>>, %secretIndex: !secret.secret<index>)) -> !secret.secret<i16> { ... %0 = secret.generic ins(%secretTensor, %secretIndex : !secret.secret<tensor<32xi16>>, !secret.secret<index>) { ^bb0(%tensor: tensor<32xi16>, %index: index): // Violation: tensor.extract loads value at secret index %extractedValue = tensor.extract %tensor[%index] : tensor<16xi32> ... }
Output:
```mlir
func.func @main(%secretTensor: !secret.secret<tensor<32xi16>>, %secretIndex: !secret.secret<index>)) -> !secret.secret<i16> {
...
%0 = secret.generic ins(%secretTensor, %secretIndex : !secret.secret<tensor<32xi16>>, !secret.secret<index>) {
^bb0(%tensor: tensor<32xi16>, %index: index):
%extractedValue = affine.for %i=0 to 16 iter_args(%arg= %dummyValue) -> (i32) {
// 1. Check if %i matches %index
%cond = arith.cmpi eq, %i, %index : index
// 2. Extract value at %i
%value = tensor.extract %tensor[%i] : tensor<16xi32>
// 3. If %i matches %index, yield %value extracted in (2), else yield %dummyValue
%result = scf.if %cond -> (i32) {
scf.yield %value : i32
} else{
scf.yield %arg : i32
}
// 4. Yield result from (3)
affine.yield %result : i32
} … }
```
-convert-secret-for-to-static-for
Convert secret scf.for ops to affine.for ops with constant bounds.
Conversion for For-operation that evaluate secret bound(s) to alternative affine For-operation with constant bound(s).
It replaces data-dependent bounds with an If-operation to check the bounds, and conditionally execute and yield values from the For-operation’s body.
Note: Running this pass alone does not result in a data-oblivious program; we have to run the --convert-if-to-select
pass to the resulting program to convert the secret-dependent If-operation to a Select-operation.
Example input:
func.func @main(%secretTensor: !secret.secret<tensor<16xi32>>, %secretLower: !secret.secret<index>, %secretUpper: !secret.secret<index>) -> !secret.secret<i32> {
...
%0 = secret.generic ins(%secretTensor, %secretLower, %secretUpper : !secret.secret<tensor<16xi32>>, !secret.secret<index>, !secret.secret<index>){
^bb0(%tensor: tensor<16xi32>, %lower : index, %upper : index ):
...
%1 = scf.for %i = %lower to %upper step %step iter_args(%arg = %val) -> (i32) {
%extracted = tensor.extract %input[%i] : tensor<16xi32>
%sum = arith.addi %extracted, %arg : i32
scf.yield %sum : i32
} {lower = 0, upper = 16}
secret.yield %1 : i32
} -> !secret.secret<i32>
return %0 : !secret.secret<i32>
Output:
func.func @main(%secretTensor: !secret.secret<tensor<16xi32>>, %secretIndex: !secret.secret<index> {secret.secret}) -> !secret.secret<i32> {
...
%0 = secret.generic ins(%secretTensor, %secretLower, %secretUpper : !secret.secret<tensor<16xi32>>, !secret.secret<index>, !secret.secret<index>){
^bb0(%tensor: tensor<16xi32>, %lower : index, %upper : index ):
...
%1 = affine.for %i = 0 to 16 step %step iter_args(%arg = %val) -> (i32) {
%lowerCond = arith.cmpi sge, %i, %index : index
%upperCond = arith.cmpi slt, %i, %index : index
%cond = arith.andi %lowerCond, %upperCond : i1
%result = scf.if(%cond) -> (i32) {
%extracted = tensor.extract %input[%i] : tensor<16xi32>
%sum = arith.addi %extracted, %arg : i32
scf.yield %sum : i32
} else {
scf.yield %arg : i32
}
affine.yield %result : i32
} {lower = 0, upper = 16}
secret.yield %1 : i32
} -> !secret.secret<i32>
return %0 : !secret.secret<i32>
-convert-secret-insert-to-static-insert
Convert tensor.insert
operations on secret index to static insert operations.
Converts tensor.insert
operations that write to secret index to alternative static tensor.insert
operations that inserts the inserted value at each index and conditionally selects the newly produced tensor that contains the value at the secret index.
Note: Running this pass alone does not result in a data-oblivious program; we have to run the --convert-if-to-select
pass to the resulting program to convert the secret-dependent If-operation to a Select-operation.
Example input:
func.func @main(%secretTensor: !secret.secret<tensor<32xi16>>, %secretIndex: !secret.secret<index>)) -> !secret.secret<i16> {
...
%0 = secret.generic ins(%secretTensor, %secretIndex : !secret.secret<tensor<32xi16>>, !secret.secret<index>) {
^bb0(%tensor: tensor<32xi16>, %index: index):
// Violation: tensor.insert writes value at secret index
%inserted = tensor.insert %newValue into %tensor[%index] : tensor<16xi32>
...
}
Output:
func.func @main(%secretTensor: !secret.secret<tensor<32xi16>>, %secretIndex: !secret.secret<index>)) -> !secret.secret<i16> {
...
%0 = secret.generic ins(%secretTensor, %secretIndex : !secret.secret<tensor<32xi16>>, !secret.secret<index>) {
^bb0(%tensor: tensor<32xi16>, %index: index):
%inserted = affine.for %i=0 to 16 iter_args(%inputArg = %tensor) -> tensor<16xi32> {
// 1. Check if %i matches the %index
%cond = arith.cmpi eq, %i, %index : index
// 2. Insert %newValue and produce %newTensor
%newTensor = tensor.insert %value into %inputArg[%i] : tensor<16xi32>
// 3. If %i matches %inputIndex, yield %newTensor, else yield unchanged input tensor
%finalTensor = scf.if %cond -> (i32) {
scf.yield %newTensor : tensor<16xi32>
} else{
scf.yield %inputArg : tensor<16xi32>
}
// 4. Yield final tensor
affine.yield %finalTensor : tensor<16xi32>
}
...
}
-convert-secret-while-to-static-for
Convert secret scf.while ops to affine.for ops that have constant bounds.
Convert scf.while with a secret condition to affine.for with constant bounds. It replaces the scf.condition operation found in the scf.while loop with an scf.if operation that conditionally executes operations in the while operation’s body and yields values.
A “max_iter” attribute should be specified as part of the secret-dependent scf.while operation to successfully transform to a secret-independent affine.for operation. This attribute determines the maximum number of iterations for the new affine.for operation.
Note: Running this pass alone does not result in a data-oblivious program; we have to run the --convert-if-to-select
pass to the resulting program to convert the secret-dependent If-operation to a Select-operation.
Example input:
// C-like code
int main(int secretInput) {
while (secretInput > 100) {
secretInput = secretInput * secretInput;
}
return secretInput;
}
// MLIR
func.func @main(%secretInput: !secret.secret<i16>) -> !secret.secret<i16> {
%c100 = arith.constant 100 : i16
%0 = secret.generic ins(%secretInput : !secret.secret<i16>) {
^bb0(%input: i16):
%1 = scf.while (%arg1 = %input) : (i16) -> i16 {
%2 = arith.cmpi sgt, %arg1, %c100 : i16
scf.condition(%2) %arg1 : i16
} do {
^bb0(%arg1: i16):
%3 = arith.muli %arg1, %arg1 : i16
scf.yield %3 : i16
} attributes {max_iter = 16 : i64}
secret.yield %1 : i16
} -> !secret.secret<i16>
return %0 : !secret.secret<i16>
}
Output:
func.func @main(%secretInput: !secret.secret<i16>) -> !secret.secret<i16> {
%c100 = arith.constant 100 : i16
%0 = secret.generic ins(%secretInput : !secret.secret<i16>) {
^bb0(%input: i16):
%1 = affine.for 0 to 16 iter_args(%arg1 = %input) -> (i16) {
%2 = arith.cmpi sgt, %arg1, %c100 : i16
%3 = scf.if (%2) -> i16{
%4 = arith.muli %arg1, %arg1 : i16
scf.yield %4 : i16
} else {
scf.yield %arg1 : i16
}
affine.yield %3 : i16
} attributes {max_iter = 16 : i64}
secret.yield %1 : i16
} -> !secret.secret<i16>
return %0 : !secret.secret<i16>
}
-convert-tensor-to-scalars
Effectively ‘unrolls’ tensors of static shape to scalars.
This pass will convert a static-shaped tensor type to a TypeRange containing product(dim) copies of the element type of the tensor. This pass currently includes two patterns:
- It converts tensor.from_elements operations to the corresponding scalar inputs.
- It converts tensor.insert operations by updating the ValueRange corresponding to the converted input and updating it with the scalar to be inserted.
It also applies folders greedily to simplify, e.g., extract(from_elements).
Note: The pass is designed to be run on an IR, where the only operations
with tensor typed operands are tensor “management” operations such as insert/extract,
with all other operations (e.g., arith operations) already taking (extracted) scalar inputs.
For example, an IR where elementwise operations have been converted to scalar operations via
--convert-elementwise-to-affine
.
The pass might insert new tensor.from_elements operations or manually create the scalar ValueRange via inserting tensor.extract operations if any operations remain that operate on tensors. The pass currently applies irrespective of tensor size, i.e., might be very slow for large tensors.
TODO (#1023): Extend this pass to support more tensor operations, e.g., tensor.slice
Options
-max-size : Limits `unrolling` to tensors with at most max-size elements
-drop-unit-dims
Drops unit dimensions from linalg ops.
This pass converts linalg
whose operands have unit dimensions
in their types to specialized ops that drop these unit dimensions.
For example, a linalg.matmul
whose RHS has type tensor<32x1xi32>
is
converted to a linalg.matvec
op on the underlying tensor<32xi32>
.
-expand-copy
Expands memref.copy ops to explicit affine loads and stores
This pass removes memref copy operations by expanding them to affine loads and stores. This pass introduces affine loops over the dimensions of the MemRef, so must be run prior to any affine loop unrolling in a pipeline.
Input
module {
func.func @memref_copy() {
%alloc = memref.alloc() : memref<2x3xi32>
%alloc_0 = memref.alloc() : memref<2x3xi32>
memref.copy %alloc, %alloc_0 : memref<1x1xi32> to memref<1x1xi32>
}
}
Output
module {
func.func @memref_copy() {
%alloc = memref.alloc() : memref<2x3xi32>
%alloc_0 = memref.alloc() : memref<2x3xi32>
affine.for %arg0 = 0 to 2 {
affine.for %arg1 = 0 to 3 {
%1 = affine.load %alloc[%arg0, %arg1] : memref<2x3xi32>
affine.store %1, %alloc_0[%arg0, %arg1] : memref<2x3xi32>
}
}
}
}
When --disable-affine-loop=true
is set, then the output becomes
module {
func.func @memref_copy() {
%alloc = memref.alloc() : memref<2x3xi32>
%alloc_0 = memref.alloc() : memref<2x3xi32>
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%0 = affine.load %alloc[%c0, %c0] : memref<2x3xi32>
affine.store %0, %alloc_0[%c0, %c0] : memref<2x3xi32>
%1 = affine.load %alloc[%c0, %c1] : memref<2x3xi32>
affine.store %1, %alloc_0[%c0, %c1] : memref<2x3xi32>
%2 = affine.load %alloc[%c0, %c2] : memref<2x3xi32>
affine.store %2, %alloc_0[%c0, %c2] : memref<2x3xi32>
[...]
}
}
Options
-disable-affine-loop : Use this to control to disable using affine loops
-extract-loop-body
Extracts logic of a loop bodies into functions.
This pass extracts logic in the inner body of for loops into functions.
This pass requires that tensors are lowered to memref. It expects that a loop body contains a number of affine.load statements used as inputs to the extracted function, and a single affine.store used as the extracted function’s output.
Input
module {
func.func @loop_body() {
%c-128_i8 = arith.constant -128 : i8
%c127_i8 = arith.constant 127 : i8
%alloc_7 = memref.alloc() {alignment = 64 : i64} : memref<25x20x8xi8>
affine.for %arg1 = 0 to 25 {
affine.for %arg2 = 0 to 20 {
affine.for %arg3 = 0 to 8 {
%98 = affine.load %alloc_6[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
%99 = arith.cmpi slt, %arg0, %c-128_i8 : i8
%100 = arith.select %99, %c-128_i8, %arg0 : i8
%101 = arith.cmpi sgt, %arg0, %c127_i8 : i8
%102 = arith.select %101, %c127_i8, %100 : i8
affine.store %102, %alloc_7[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
}
}
}
}
}
Output
module {
func.func @loop_body() {
%alloc_7 = memref.alloc() {alignment = 64 : i64} : memref<25x20x8xi8>
affine.for %arg1 = 0 to 25 {
affine.for %arg2 = 0 to 20 {
affine.for %arg3 = 0 to 8 {
%98 = affine.load %alloc_6[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
%102 = func.call @__for_loop(%98) : (i8) -> i8
affine.store %102, %alloc_7[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
}
}
}
}
func.func private @__for_loop(%arg0: i8) -> i8 {
%c-128_i8 = arith.constant -128 : i8
%c127_i8 = arith.constant 127 : i8
%99 = arith.cmpi slt, %arg0, %c-128_i8 : i8
%100 = arith.select %99, %c-128_i8, %arg0 : i8
%101 = arith.cmpi sgt, %arg0, %c127_i8 : i8
%102 = arith.select %101, %c127_i8, %100 : i8
return %102 : i8
}
}
Options
-min-loop-size : Use this to control the minimum loop size to apply this pass
-min-body-size : Use this to control the minimum loop body size to apply this pass
-forward-insert-to-extract
Forward inserts to extracts within a single block
This pass is similar to forward-store-to-load pass where store ops are forwarded load ops; here instead tensor.insert ops are forwarded to tensor.extract ops.
Does not support complex control flow within a block, nor ops with arbitrary subregions.
-forward-store-to-load
Forward stores to loads within a single block
This pass is a simplified version of mem2reg and similar passes. It analyzes an operation, finding all basic blocks within that op that have memrefs whose stores can be forwarded to loads.
Does not support complex control flow within a block, nor ops with arbitrary subregions.
-full-loop-unroll
Fully unroll all loops
Scan the IR for affine.for loops and unroll them all.
-insert-rotate
Vectorize arithmetic FHE operations using HECO-style heuristics
This pass implements the SIMD-vectorization passes from the HECO paper.
The pass operates by identifying arithmetic operations that can be suitably combined into a combination of cyclic rotations and vectorized operations on tensors. It further identifies a suitable “slot target” for each operation and heuristically aligns the operations to reduce unnecessary rotations.
This pass by itself does not eliminate any operations, but instead inserts
well-chosen rotations so that, for well-structured code (like unrolled affine loops),
a subsequent --cse
and --canonicalize
pass will dramatically reduce the IR.
As such, the pass is designed to be paired with the canonicalization patterns
in tensor_ext
, as well as the collapse-insertion-chains
pass, which
cleans up remaining insertion and extraction ops after the main simplifications
are applied.
Unlike HECO, this pass operates on plaintext types and tensors, along with
the HEIR-specific tensor_ext
dialect for its cyclic rotate
op. It is intended
to be run before lowering to a scheme dialect like bgv
.
-linalg-canonicalizations
This pass canonicalizes the linalg.transpose operation of a constant into a transposed constant.
This pass canonicalizes the linalg.transpose operation of a constant into a transposed constant.
-linalg-to-tensor-ext
Lower linalg.matmul
to arith and tensor_ext dialects.
This pass lowers the linalg.matmul
to a mixture of affine, tensor, and
via the Halevi-Shoup and squat matrix multiplication algorithms.
-lwe-add-client-interface
Add client interfaces to (R)LWE encrypted functions
This pass adds encrypt and decrypt functions for each compiled function in the IR. These functions maintain the same interface as the original function, while the compiled function may lose some of this information by the lowerings to ciphertext types (e.g., a scalar ciphertext, when lowered through RLWE schemes, must be encoded as a tensor).
Options
-use-public-key : If true, generate a client interface that uses a public key for encryption.
-one-value-per-helper-fn : If true, split encryption helpers into separate functions for each SSA value.
-lwe-set-default-parameters
Set default parameters for LWE ops
This pass adds default parameters to all lwe
types as the lwe_params
attribute, and for lwe
ops as the params
attribute, overriding any
existing attributes set with those names.
This pass is primarily for testing purposes, and as a parameter provider before a proper parameter selection mechanism is added. This pass should not be used in production.
The specific parameters are hard-coded in
lib/Dialect/LWE/Transforms/SetDefaultParameters.cpp
.
-lwe-to-openfhe
Lower lwe
to openfhe
dialect.
This pass lowers the lwe
dialect to Openfhe
dialect.
Currently, this also includes patterns that apply directly to ckks
and bgv
dialect operations.
TODO (#1193): investigate if the need for ckks/bgv
patterns in --lwe-to-openfhe
is permanent.
-lwe-to-polynomial
Lower lwe
to polynomial
dialect.
This pass lowers the lwe
dialect to polynomial
dialect.
-memref-global-replace
MemrefGlobalReplacePass forwards global memrefs accessors to arithmetic values
This pass forwards constant global MemRef values to referencing affine loads. This pass requires that the MemRef global values are initialized as constants and that the affine load access indices are constants (i.e. not variadic). Unroll affine loops prior to running this pass.
MemRef removal is required to remove any memory allocations from the input model (for example, TensorFlow models contain global memory holding model weights) to support FHE transpilation.
Input
module {
memref.global "private" constant @__constant_8xi16 : memref<2x4xi16> = dense<[[-10, 20, 3, 4], [5, 6, 7, 8]]>
func.func @main() -> i16 {
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%0 = memref.get_global @__constant_8xi16 : memref<2x4xi16>
%1 = affine.load %0[%c1, %c1 + %c2] : memref<2x4xi16>
return %1 : i16
}
}
Output
module {
func.func @main() -> i16 {
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%c8_i16 = arith.constant 8 : i16
return %c8_i16 : i16
}
}
-mod-arith-to-arith
Lower mod_arith
to standard arith
.
This pass lowers the mod_arith
dialect to their arith
equivalents.
-mod-arith-to-mac
Finds consecutive ModArith mul and add operations and converts them to a Mac operation
Walks over the programs to find Add operations, it checks if the any operands originates from a mul operation. If so, it converts the Add operation to a Mac operation and removes the mul operation.
-openfhe-configure-crypto-context
Configure the crypto context in OpenFHE
This pass generates helper functions to generate and configure the OpenFHE crypto context for the given function. Generating the crypto context sets the appropriate encryption parameters, while the configuration generates the necessary evaluation keys (relinearization and rotation keys).
For example, for an MLIR function @my_func
, the generated helpers have the following signatures
func.func @my_func__generate_crypto_context() -> !openfhe.crypto_context
func.func @my_func__configure_crypto_context(!openfhe.crypto_context, !openfhe.private_key) -> !openfhe.crypto_context
Options
-entry-function : Default entry function name of entry function.
-level-budget-encode : Level budget for CKKS bootstrap encode (s2c) phase
-level-budget-decode : Level budget for CKKS bootstrap decode (c2s) phase
-insecure : Whether to use insecure parameter for faster evaluation(should only be used in test) (defaults to false)
-operation-balancer
This pass balances addition and multiplication operations.
This pass examines a tree or graph of add and multiplication operations and balances them to minimize the depth of the tree. This exposes better parallelization and reducing the multiplication depth can decrease the parameters used in FHE, which improves performance. This pass is not necessarily optimal, as there may be intermediate computations that this pass does not optimally minimize the depth for.
The algorithm is to analyze a graph of addition operations and do a depth-first search for the operands (from the last computed values in the graph). If there are intermediate computations that are used more than once, then the pass treats that computation as its own tree to balance instead of trying to minimize the global depth of the tree.
This pass only runs on addition and multiplication operations on the arithmetic dialect that are encapsulated inside a secret.generic.
This pass was inspired by section 2.6 of ‘EVA Improved: Compiler and Extension Library for CKKS’ by Chowdhary et al.
-optimize-relinearization
Optimize placement of relinearization ops
This pass defers relinearization ops as late as possible in the IR. This is more efficient in cases where multiplication operations are followed by additions, such as in a dot product. Because relinearization also adds error, deferring it can reduce the need for bootstrapping.
In this pass, we use an integer linear program to determine the optimal
relinearization strategy. It solves an ILP for each func
op in the IR.
The assumptions of this pass include:
- All return values of functions must be linearized.
- All ciphertext arguments to an op must have the same key basis
- Rotation op inputs must have be linearized.
For an ILP model specification, see the docs at the HEIR website. The model is an adaptation of the ILP described in a blog post by Jeremy Kun.
Options
-use-loc-based-variable-names : When true, the ILP uses op source locations in variable names, which can help debug ILP model bugs.
-polynomial-to-mod-arith
Lower polynomial
to standard MLIR dialects.
This pass lowers the polynomial
dialect to standard MLIR plus mod_arith,
including possibly ops from affine, tensor, linalg, and arith.
-remove-unused-memref
Cleanup any unused memrefs
Scan the IR for unused memrefs and remove them.
This pass looks for locally allocated memrefs that are never used and deletes them. This pass can be used as a cleanup pass from other IR simplifications that forward stores to loads.
-rotate-and-reduce
Use a logarithmic number of rotations to reduce a tensor.
This pass identifies when a commutative, associative binary operation is used to reduce all of the entries of a tensor to a single value, and optimizes the operations by using a logarithmic number of reduction operations.
In particular, this pass identifies an unrolled set of operations of the form (the binary ops may come in any order):
%0 = tensor.extract %t[0] : tensor<8xi32>
%1 = tensor.extract %t[1] : tensor<8xi32>
%2 = tensor.extract %t[2] : tensor<8xi32>
%3 = tensor.extract %t[3] : tensor<8xi32>
%4 = tensor.extract %t[4] : tensor<8xi32>
%5 = tensor.extract %t[5] : tensor<8xi32>
%6 = tensor.extract %t[6] : tensor<8xi32>
%7 = tensor.extract %t[7] : tensor<8xi32>
%8 = arith.addi %0, %1 : i32
%9 = arith.addi %8, %2 : i32
%10 = arith.addi %9, %3 : i32
%11 = arith.addi %10, %4 : i32
%12 = arith.addi %11, %5 : i32
%13 = arith.addi %12, %6 : i32
%14 = arith.addi %13, %7 : i32
and replaces it with a logarithmic number of rotate
and addi
operations:
%0 = tensor_ext.rotate %t, 4 : tensor<8xi32>
%1 = arith.addi %t, %0 : tensor<8xi32>
%2 = tensor_ext.rotate %1, 2 : tensor<8xi32>
%3 = arith.addi %1, %2 : tensor<8xi32>
%4 = tensor_ext.rotate %3, 1 : tensor<8xi32>
%5 = arith.addi %3, %4 : tensor<8xi32>
-secret-capture-generic-ambient-scope
Capture the ambient scope used in a secret.generic
For each value used in the body of a secret.generic
op, which is defined
in the ambient scope outside the generic
, add it to the argument list of
the generic
.
-secret-distribute-generic
Distribute generic
ops through their bodies.
Converts generic
ops whose region contains many ops into smaller
sequences of generic ops whose regions contain a single op, dropping the
generic
part from any resulting generic
ops that have no
secret.secret
inputs. If the op has associated regions, and the operands
are not secret, then the generic is distributed recursively through the
op’s regions as well.
This pass is intended to be used as part of a front-end pipeline, where a
program that operates on a secret type annotates the input to a region as
secret
, and then wraps the contents of the region in a single large
secret.generic
, then uses this pass to simplify it.
The distribute-through
option allows one to specify a comma-separated
list of op names (e.g., distribute-thorugh="affine.for,scf.if"
), which
limits the distribution to only pass through those ops. If unset, all ops
are distributed through when possible.
Options
-distribute-through : comma-separated list of ops that should be distributed through
-secret-extract-generic-body
Extract the bodies of all generic ops into functions
This pass extracts the body of all generic ops into functions, and replaces the generic bodies with call ops. Used as a sub-operation in some passes, and extracted into its own pass for testing purposes.
This pass works best when --secret-generic-absorb-constants
is run
before it so that the extracted function contains any constants used
in the generic op’s body.
-secret-forget-secrets
Convert secret types to standard types
Drop the secret<...>
type from the IR, replacing it with the contained
type and the corresponding cleartext computation.
-secret-generic-absorb-constants
Copy constants into a secret.generic body
For each constant value used in the body of a secret.generic
op, which is
defined in the ambient scope outside the generic
, add it’s definition into
the generic
body.
-secret-generic-absorb-dealloc
Copy deallocs of internal memrefs into a secret.generic body
For each memref allocated and used only within a body of a secret.generic
op, add it’s dealloc of the memref into its generic
body.
-secret-insert-mgmt-bgv
Place BGV ciphertext management operations
This pass implements the following placement strategy:
For relinearize, after every homomorphic ciphertext-ciphertext multiplication, a mgmt.relinearize is placed after the operation. This is done to ensure that the ciphertext keeps linear.
For modulus switching, it is inserted right before a homomorphic multiplication,
including ciphertext-plaintext ones. There is an option include-first
controlling
whether to switch modulus before the first multiplication.
User can check the FLEXIBLEAUTOEXT and FLEXIBLEAUTO mode in OpenFHE as a reference. To know more technical difference about them, user can refer to the paper “Revisiting homomorphic encryption schemes for finite firelds”.
Then, for level-mismatching binary operations like addition and subtraction, additional modulus switch is placed for the operand until it reaches the same level.
This is different from crosslevel operation handling in other implementations like using modulus switching and level drop together. The reason we only use modulus switching is for simplicity for now. Further optimization on this pass could implement such a strategy.
Before yield the final result, a modulus switching is placed if it is a result of multiplication or derived value of a multiplication.
Also, it annotates the mgmt.mgmt attribute for each operation, which includes the level and dimension information of a ciphertext. This information is subsequently used by the secret-to-bgv pass to properly lower to corresponding RNS Type.
Example of multiplication+addition:
func.func @func(%arg0: !secret.secret<i16>, %arg1: !secret.secret<i16>) -> !secret.secret<i16> {
%0 = secret.generic ins(%arg0, %arg1 : !secret.secret<i16>, !secret.secret<i16>) {
^bb0(%arg2: i16, %arg3: i16):
%1 = arith.muli %arg2, %arg3 : i16
%2 = arith.addi %1, %arg3 : i16
secret.yield %2 : i16
} -> !secret.secret<i16>
return %0 : !secret.secret<i16>
}
which get transformed to:
func.func @func(%arg0: !secret.secret<i16>, %arg1: !secret.secret<i16>) -> !secret.secret<i16> {
%0 = secret.generic ins(%arg0, %arg1 : !secret.secret<i16>, !secret.secret<i16>) attrs = {arg0 = {mgmt.mgmt = #mgmt.mgmt<level = 1>}, arg1 = {mgmt.mgmt = #mgmt.mgmt<level = 1>}} {
^bb0(%arg2: i16, %arg3: i16):
%1 = arith.muli %arg2, %arg3 {mgmt.mgmt = #mgmt.mgmt<level = 1, dimension = 3>} : i16
%2 = mgmt.relinearize %1 {mgmt.mgmt = #mgmt.mgmt<level = 1>} : i16
%3 = arith.addi %2, %arg3 {mgmt.mgmt = #mgmt.mgmt<level = 1>} : i16
%4 = mgmt.modreduce %3 {mgmt.mgmt = #mgmt.mgmt<level = 0>} : i16
secret.yield %4 : i16
} -> !secret.secret<i16>
return %0 : !secret.secret<i16>
}
Options
-include-first-mul : Modulus switching right before the first multiplication (default to false)
-secret-insert-mgmt-ckks
Place CKKS ciphertext management operations
Check the description of secret-insert-mgmt-bgv. This pass implements similar strategy, where mgmt.modreduce stands for ckks.rescale here.
For bootstrap insertion policy, currently a greedy policy is used where when all levels are consumed then a bootstrap is inserted.
The max level available after bootstrap is controlled by the option
bootstrap-waterline
.
Number of bootstrap consumed level is not shown here, which is handled by further lowering. TODO(#1207): handle it here so parameter selection can depend on it. TODO(#1207): with this info we can encrypt at max level (with bootstrap consumed level).
Options
-include-first-mul : Modulus switching right before the first multiplication (default to false)
-slot-number : Default number of slots use for ciphertext space.
-bootstrap-waterline : Waterline for insert bootstrap op
-secret-merge-adjacent-generics
Merge two adjacent generics into a single generic
This pass merges two immedaitely sequential generics into a single generic. Useful as a sub-operation in some passes, and extracted into its own pass for testing purposes.
-secret-to-bgv
Lower secret
to bgv
dialect.
This pass lowers an IR with secret.generic
blocks containing arithmetic
operations to operations on ciphertexts with the BGV dialect.
The pass assumes that the secret.generic
regions have been distributed
through arithmetic operations so that only one ciphertext operation appears
per generic block. It also requires that canonicalize
was run so that
non-secret values used are removed from the secret.generic
’s block
arguments.
The pass requires that all types are tensors of a uniform shape matching the
dimension of the ciphertext space specified my poly-mod-degree
.
Options
-poly-mod-degree : Default degree of the cyclotomic polynomial modulus to use for ciphertext space.
-coefficient-mod-bits : Default number of bits of the prime coefficient modulus to use for the ciphertext space.
-secret-to-cggi
Lower secret
to cggi
dialect.
This pass lowers the secret
dialect to cggi
dialect.
-secret-to-ckks
Lower secret
to ckks
dialect.
This pass lowers an IR with secret.generic
blocks containing arithmetic
operations to operations on ciphertexts with the CKKS dialect.
The pass assumes that the secret.generic
regions have been distributed
through arithmetic operations so that only one ciphertext operation appears
per generic block. It also requires that canonicalize
was run so that
non-secret values used are removed from the secret.generic
’s block
arguments.
The pass requires that all types are tensors of a uniform shape matching the
dimension of the ciphertext space specified my poly-mod-degree
.
Options
-poly-mod-degree : Default degree of the cyclotomic polynomial modulus to use for ciphertext space.
-coefficient-mod-bits : Default number of bits of the prime coefficient modulus to use for the ciphertext space.
-secretize
Adds secret argument attributes to entry function
Helper pass that adds a secret.secret attribute argument to each function argument. By default, the pass applies to all functions in the module. This may be overridden with the option -function=func_name to apply to a single function only.
Options
-function : function to add secret annotations to
-straight-line-vectorize
A vectorizer for straight line programs.
This pass ignores control flow and only vectorizes straight-line programs within a given region.
Options
-dialect : Use this to restrict the dialect whose ops should be vectorized.
-tosa-to-secret-arith
Lower tosa.sigmoid
to secret arith dialects.
This pass lowers the tosa.sigmoid
dialect to the polynomial approximation
-0.004 * x^3 + 0.197 * x + 0.5 (composed of arith, affine, and tensor operations).
This polynomial approximation of sigmoid only works over the range [-5, 5] and is taken from the paper ‘Logisitic regression over encrypted data from fully homomorphic encryption’ by Chen et al..
-unroll-and-forward
Loop unrolls and forwards stores to loads.
This pass processes the first function in a given module, and, starting from the first loop, iteratively does the following:
- Fully unroll the loop.
- Scan for load ops. For each load op with a statically-inferrable access index:
- Backtrack to the original memref alloc
- Find all store ops at the corresponding index (possibly transitively through renames/subviews of the underlying alloc).
- Find the last store that occurs and forward it to the load.
- If the original memref is an input memref, then forward through any renames to make the target load load directly from the argument memref (instead of any subviews, say)
- Apply the same logic to any remaining loads not inside any for loop.
This pass requires that tensors are lowered to memref, and only supports affine loops with affine.load/store ops.
Memrefs that result from memref.get_global ops are excluded from forwarding, even if they are loaded with a static index, and are instead handled by memref-global-replace, which should be run after this pass.
-wrap-generic
Wraps regions using secret args in secret.generic bodies
This pass converts functions (func.func
) with {secret.secret}
annotated arguments
to use !secret.secret<...>
types and wraps the function body in a secret.generic
region.
The output type is also converted to !secret.secret<...>
.
Example input:
func.func @main(%arg0: i32 {secret.secret}) -> i32 {
%0 = arith.constant 100 : i32
%1 = arith.addi %0, %arg0 : i32
return %1 : i32
}
Output:
func.func @main(%arg0: !secret.secret<i32>) -> !secret.secret<i32> {
%0 = secret.generic ins(%arg0 : !secret.secret<i32>) {
^bb0(%arg1: i32):
%1 = arith.constant 100 : i32
%2 = arith.addi %0, %arg1 : i32
secret.yield %2 : i32
} -> !secret.secret<i32>
return %0 : !secret.secret<i32>
}
-yosys-optimizer
Invoke Yosys to perform circuit optimization.
This pass invokes Yosys to convert an arithmetic circuit to an optimized boolean circuit that uses the arith and comb dialects.
Note that booleanization changes the function signature: multi-bit integers
are transformed to a tensor of booleans, for example, an i8
is converted
to tensor<8xi1>
.
The optimizer will be applied to each secret.generic
op containing
arithmetic ops that can be optimized.
Optional parameters:
abc-fast
: Run the abc optimizer in “fast” mode, getting faster compile time at the expense of a possibly larger output circuit.unroll-factor
: Before optimizing the circuit, unroll loops by a given factor. If unset, this pass will not unroll any loops.print-stats
: Prints statistics about the optimized circuits.mode={Boolean,LUT}
: Map gates to boolean gates or lookup table gates.use-submodules
: Extract the body of a generic op into submodules. Useful for large programs with generics that can be isolated. This should not be used when distributing generics through loops to avoid index arguments in the function body.
Statistics
total circuit size : The total circuit size for all optimized circuits, after optimization is done.