Passes

`-add-client-interface`

Add client interfaces to secret functions

This pass adds encrypt and decrypt functions for each compiled function in the IR. These functions maintain the same interface as the original function, while the compiled function may lose some of this information by the lowerings to ciphertext types (e.g., a scalar ciphertext, when lowered through RLWE schemes, must be encoded as a tensor).

This pass occurs at the secret level, which is necessary because some backends like the plaintext backend don’t actually encrypt, but still require the ciphertext layout/packing logic to convert cleartexts to plaintexts.

Options

-ciphertext-size          : Power of two length of the ciphertexts the data is packed in.
-enable-layout-assignment : If false, skips the emission of layout assignment operations, essentially assuming that the input was already using correctly (ciphertext-)sized tensors.

`-annotate-mgmt`

Annotate MgmtAttr for secret SSA values in the IR

This pass runs the secretness/level/dimension analysis and annotates the IR with the results, saving it into each op’s attribute dictionary as mgmt.mgmt :

Options

-base-level : Level to start counting from (used by B/FV)

`-annotate-module`

Annotate ModuleOp with Scheme And/Or Backend

This pass annotates the module with a scheme and/or backend.

This pass should be called before all lowering to enable lowering to the desired scheme and backend.

Available scheme:

bgv
ckks
bfv
cggi

Available backend:

openfhe
lattigo

Example

Command: heir-opt --annotate-module="backend=openfhe scheme=ckks" tests/Transforms/annotate_module/pass.mlir

Input:

module {

}

Output:

module attributes {backend.openfhe, scheme.ckks} {
}

Options

-scheme  : The scheme to annotate the module with.
-backend : The backend to annotate the module with.

`-annotate-secretness`

Annotate secret SSA values in the IR

Debugging helper that runs the secretness analysis and annotates the IR with the results, extending the {secret.secret} annotation to all operation results that are secret.

In addition to annotating operation results, the pass also annotates arguments and return types in func.func operations, as well as any terminators (e.g. return)

In verbose mode, all results are annotated, including public ones with {secret.public}, and values for which the secretness analysis is missing are annotated with {secret.missing}, while values where the secretness analysis is inconclusive are annotated with {secret.unknown}.

Options

-verbose : If true, annotate secretness state all values, including public ones, and values with missing or inconclusive analysis.

`-apply-folders`

Apply all folding patterns from canonicalize

This pass applies all registered folding patterns greedily to the input IR. This is useful when running a full canonicalize is too slow, but applying folders before canonicalize is sufficient to simplify the IR for later passes, or even sufficient to then subsequently run a full canonicalize pass.

This is used to prepare an IR for insert-rotate after fully unrolling loops.

`-arith-to-cggi-quart`

Lower arith to cggi dialect and divide each operation into smaller parts.

This pass converts high precision arithmetic operations, i.e. operations on 32 bit integer, into a sequence of lower precision operations, i.e 8b operations. Currently, the pass splits the 32b integer into four 8b integers, using the tensor dialect. These smaller integers are stored in an 16b integer, so that we don’t lose the carry information. This pass converts the arith dialect to the cggi dialect.

Based on the arith-emulate-wide-int pass from the MLIR arith dialect.

General assumption: the first element in the tensor is also the LSB element.

`-arith-to-cggi`

Lower arith to cggi dialect.

`-arith-to-mod-arith`

Lower standard arith to mod-arith.

This pass lowers the arith dialect to their mod-arith equivalents.

The arith-to-mod-arith pass is required to lower a neural network TOSA model to a CGGI backend. This pass will transform the operations to the mod-arith dialect, where the find-mac pass can be used to convert consecutive multiply addition operations into a single operation. In a later pass, these large precision MAC operations (typically 64 or 32-bit) will be lowered into small precision (8 or 4b) operations that can be mapped to CGGI operations.

Options

-modulus : Modulus to use for the mod-arith dialect. If not specified, the pass will use the natural modulus for that integer type

`-bgv-to-lwe`

Lower bgv to lwe dialect.

This pass lowers the bgv dialect to lwe dialect. Note that some scheme specific ops (e.g., modswitch) that have no direct analogue in the lwe dialect are left unchanged. TODO (#1193): support both “common” and “full” lwe lowering

`-cggi-boolean-vectorize`

Group different logic gates with the packed API

This pass groups independent logic gates into a single call of the packed operations. Pass is based on the straight-line-vectorizer, but is fundamentally different. This pass combines any type of boolean gates and is not restricted to combining the same type of gate operand.

Pass is intended for the FPT tfhe-rs API, where packed_gates function get a the boolean gates are passed as a string vector and a left and right vector of ciphertexts. Each boolean gates specified in gates is then applied element wise.

let outputs_ct = fpga_key.packed_gates(&gates, &ref_to_ct_lefts, &ref_to_ct_rights);

Options

-parallelism : Parallelism factor for batching. 0 is infinite parallelism

`-cggi-decompose-operations`

Expands CGGI operations into LWE operations and programmable bootstraps

This pass expands high level CGGI operations (e.g. LUT2, XOR, etc.).

If the option expand-lincomb is set, the expansion will continue into the component LWE scalar operations and a programmable bootstrap operation. Otherwise, the expansion will be stop at the cggi.lut_lincomb level. By default, expand-lincomb is true.

For example, a LUT3 operation is composed of three LWE ciphertext inputs $c, b, a$ (in MSB to LSB ordering) which must be combined via the linear combination $4 * c + 2 * b + a$ before being fed into a programmable bootstrap defined by the lookup table.

This pass supports XOR, LUT2, LUT3, and LutLincomb operations.

Options

-expand-lincomb : Expand lincomb operations to the PBS and scalar level

`-cggi-to-jaxite`

Lower cggi to jaxite dialect.

`-cggi-to-tfhe-rust-bool`

Lower cggi to tfhe_rust_bool dialect.

`-cggi-to-tfhe-rust`

Lower cggi to tfhe_rust dialect.

`-ckks-to-lwe`

Lower ckks to lwe dialect.

This pass lowers the ckks dialect to lwe dialect. Note that some scheme specific ops (e.g., rescale) that have no direct analogue in the lwe dialect are left unchanged. TODO (#1193): support both “common” and “full” lwe lowering

`-collapse-insertion-chains`

Collapse chains of extract/insert ops into rotate ops when possible

This pass is a cleanup pass for insert-rotate. That pass sometimes leaves behind a chain of insertion operations like this:

%extracted = tensor.extract %14[%c5] : tensor<16xi16>
%inserted = tensor.insert %extracted into %dest[%c0] : tensor<16xi16>
%extracted_0 = tensor.extract %14[%c6] : tensor<16xi16>
%inserted_1 = tensor.insert %extracted_0 into %inserted[%c1] : tensor<16xi16>
%extracted_2 = tensor.extract %14[%c7] : tensor<16xi16>
%inserted_3 = tensor.insert %extracted_2 into %inserted_1[%c2] : tensor<16xi16>
...
%extracted_28 = tensor.extract %14[%c4] : tensor<16xi16>
%inserted_29 = tensor.insert %extracted_28 into %inserted_27[%c15] : tensor<16xi16>
yield %inserted_29 : tensor<16xi16>

In many cases, this chain will insert into every index of the dest tensor, and the extracted values all come from consistently aligned indices of the same source tensor. In this case, the chain can be collapsed into a single rotate.

Each index used for insertion or extraction must be constant; this may require running --canonicalize or --sccp before this pass to apply folding rules (use --sccp if you need to fold constant through control flow).

`-compare-to-sign-rewrite`

Rewrites arith.cmpi/arith.cmpf to a math_ext.sign based expression

This pass rewrites arith.cmpi/cmpf %a, %b to some combination of add/mul and sign operations. TODO(#1929): provide detailed description of the expression for each predicate.

`-convert-elementwise-to-affine`

This pass lowers ElementwiseMappable operations to Affine loops.

This pass lowers ElementwiseMappable operations over tensors to affine loop nests that instead apply the operation to the underlying scalar values.

Usage: ‘–convert-elementwise-to-affine=convert-ops=arith.mulf ' restrict conversion to mulf op from arith dialect.

‘–convert-elementwise-to-affine=convert-ops=arith.addf,arith.divf convert-dialects=bgv’ restrict conversion to addf and divf ops from arith dialect and all of the ops in bgv dialect.

–convert-elementwise-to-affine=convert-dialects=arith restrict conversion to arith dialect so ops only from arith dialect is processed.

–convert-elementwise-to-affine=convert-ops=arith.addf,arith.mulf restrict conversion only to these two ops - addf and mulf - from arith dialect.

Options

-convert-ops      : comma-separated list of ops to run this pass on 
-convert-dialects : comma-separated list of dialects to run this pass on

`-convert-if-to-select`

Convert scf.if operations on secret conditions to arith.select operations.

Conversion for If-operations that evaluate secret condition to alternative select operations.

`-convert-polynomial-mul-to-ntt`

Rewrites polynomial operations to their NTT equivalents

Applies a rewrite pattern to convert polynomial multiplication to the equivalent using the number-theoretic transforms (NTT) when possible.

Polynomial multiplication can be rewritten as polynomial.NTT on each operand, followed by modulo elementwise multiplication of the point-value representation and then the inverse-NTT back to coefficient representation.

`-convert-secret-extract-to-static-extract`

Convert tensor.extract operations on secret index to static extract operations.

Converts tensor.extract operations that read value at secret index to alternative static tensor.extract operations that extracts value at each index and conditionally selects the value extracted at the secret index.

Note: Running this pass alone does not result in a data-oblivious program; we have to run the --convert-if-to-select pass to the resulting program to convert the secret-dependent If-operation to a Select-operation.

Example input: mlir func.func @main(%secretTensor: !secret.secret<tensor<32xi16>>, %secretIndex: !secret.secret<index>)) -> !secret.secret<i16> { ... %0 = secret.generic(%secretTensor, %secretIndex : !secret.secret<tensor<32xi16>>, !secret.secret<index>) { ^bb0(%tensor: tensor<32xi16>, %index: index): // Violation: tensor.extract loads value at secret index %extractedValue = tensor.extract %tensor[%index] : tensor<16xi32> ... }

Output:
```mlir
func.func @main(%secretTensor: !secret.secret<tensor<32xi16>>, %secretIndex: !secret.secret<index>)) -> !secret.secret<i16> {
  ...
  %0 = secret.generic(%secretTensor, %secretIndex : !secret.secret<tensor<32xi16>>, !secret.secret<index>) {
  ^bb0(%tensor: tensor<32xi16>, %index: index):
    %extractedValue = affine.for %i=0 to 16 iter_args(%arg= %dummyValue) -> (i32) {
      // 1. Check if %i matches %index
      %cond = arith.cmpi eq, %i, %index : index
      // 2. Extract value at %i
      %value = tensor.extract %tensor[%i] : tensor<16xi32>
      // 3. If %i matches %index, yield %value extracted in (2), else yield %dummyValue
      %result = scf.if %cond -> (i32) {
        scf.yield %value : i32
      } else{
        scf.yield %arg : i32
      }
      // 4. Yield result from (3)
      affine.yield %result : i32

} … }

```

`-convert-secret-for-to-static-for`

Convert secret scf.for ops to affine.for ops with constant bounds.

Conversion for For-operation that evaluate secret bound(s) to alternative affine For-operation with constant bound(s).

It replaces data-dependent bounds with an If-operation to check the bounds, and conditionally execute and yield values from the For-operation’s body. Note: Running this pass alone does not result in a data-oblivious program; we have to run the --convert-if-to-select pass to the resulting program to convert the secret-dependent If-operation to a Select-operation.

Example input:

  func.func @main(%secretTensor: !secret.secret<tensor<16xi32>>, %secretLower: !secret.secret<index>, %secretUpper: !secret.secret<index>) -> !secret.secret<i32> {
   ...
   %0 = secret.generic(%secretTensor, %secretLower, %secretUpper : !secret.secret<tensor<16xi32>>, !secret.secret<index>, !secret.secret<index>){
    ^bb0(%tensor: tensor<16xi32>, %lower : index, %upper : index ):
      ...
      %1 = scf.for %i = %lower to %upper step %step iter_args(%arg = %val) -> (i32) {
        %extracted = tensor.extract %input[%i] : tensor<16xi32>
        %sum = arith.addi %extracted, %arg : i32
        scf.yield %sum : i32
      } {lower = 0, upper = 16}
      secret.yield %1 : i32
  } -> !secret.secret<i32>
  return %0 : !secret.secret<i32>

Output:

  func.func @main(%secretTensor: !secret.secret<tensor<16xi32>>, %secretIndex: !secret.secret<index> {secret.secret}) -> !secret.secret<i32> {
   ...
   %0 = secret.generic(%secretTensor, %secretLower, %secretUpper : !secret.secret<tensor<16xi32>>, !secret.secret<index>, !secret.secret<index>){
    ^bb0(%tensor: tensor<16xi32>, %lower : index, %upper : index ):
      ...
      %1 = affine.for %i = 0 to 16 step %step iter_args(%arg = %val) -> (i32) {
        %lowerCond = arith.cmpi sge, %i, %index : index
        %upperCond = arith.cmpi slt, %i, %index : index
        %cond = arith.andi %lowerCond, %upperCond : i1
        %result = scf.if(%cond) -> (i32) {
          %extracted = tensor.extract %input[%i] : tensor<16xi32>
          %sum = arith.addi %extracted, %arg : i32
          scf.yield %sum : i32
        } else {
          scf.yield %arg : i32
        }
        affine.yield %result : i32
      } {lower = 0, upper = 16}
      secret.yield %1 : i32
  } -> !secret.secret<i32>
  return %0 : !secret.secret<i32>

Options

-convert-all-scf-for : If true, convert all scf.for ops to affine.for, not just those with secret bounds.

`-convert-secret-insert-to-static-insert`

Convert tensor.insert operations on secret index to static insert operations.

Converts tensor.insert operations that write to secret index to alternative static tensor.insert operations that inserts the inserted value at each index and conditionally selects the newly produced tensor that contains the value at the secret index.

Example input:

func.func @main(%secretTensor: !secret.secret<tensor<32xi16>>, %secretIndex: !secret.secret<index>)) -> !secret.secret<i16> {
  ...
  %0 = secret.generic(%secretTensor, %secretIndex : !secret.secret<tensor<32xi16>>, !secret.secret<index>) {
  ^bb0(%tensor: tensor<32xi16>, %index: index):
    // Violation: tensor.insert writes value at secret index
    %inserted = tensor.insert %newValue into %tensor[%index] : tensor<16xi32>
  ...
}

Output:

func.func @main(%secretTensor: !secret.secret<tensor<32xi16>>, %secretIndex: !secret.secret<index>)) -> !secret.secret<i16> {
  ...
  %0 = secret.generic(%secretTensor, %secretIndex : !secret.secret<tensor<32xi16>>, !secret.secret<index>) {
  ^bb0(%tensor: tensor<32xi16>, %index: index):
    %inserted = affine.for %i=0 to 16 iter_args(%inputArg = %tensor) -> tensor<16xi32> {
      // 1. Check if %i matches the %index
      %cond = arith.cmpi eq, %i, %index : index
      // 2. Insert %newValue and produce %newTensor
      %newTensor = tensor.insert %value into %inputArg[%i] : tensor<16xi32>
      // 3. If %i matches %inputIndex, yield %newTensor, else yield unchanged input tensor
      %finalTensor = scf.if %cond -> (i32) {
        scf.yield %newTensor : tensor<16xi32>
      } else{
        scf.yield %inputArg : tensor<16xi32>
      }
      // 4. Yield final tensor
      affine.yield %finalTensor : tensor<16xi32>
}
  ...
}

`-convert-secret-while-to-static-for`

Convert secret scf.while ops to affine.for ops that have constant bounds.

Convert scf.while with a secret condition to affine.for with constant bounds. It replaces the scf.condition operation found in the scf.while loop with an scf.if operation that conditionally executes operations in the while operation’s body and yields values.

A “max_iter” attribute should be specified as part of the secret-dependent scf.while operation to successfully transform to a secret-independent affine.for operation. This attribute determines the maximum number of iterations for the new affine.for operation.

Example input:

// C-like code
int main(int secretInput) {
  while (secretInput > 100) {
    secretInput = secretInput * secretInput;
  }
  return secretInput;
}

// MLIR
func.func @main(%secretInput: !secret.secret<i16>) -> !secret.secret<i16> {
  %c100 = arith.constant 100 : i16
  %0 = secret.generic(%secretInput : !secret.secret<i16>) {
  ^bb0(%input: i16):
    %1 = scf.while (%arg1 = %input) : (i16) -> i16 {
      %2 = arith.cmpi sgt, %arg1, %c100 : i16
      scf.condition(%2) %arg1 : i16
    } do {
    ^bb0(%arg1: i16):
      %3 = arith.muli %arg1, %arg1 : i16
      scf.yield %3 : i16
    } attributes {max_iter = 16 : i64}
    secret.yield %1 : i16
  } -> !secret.secret<i16>
  return %0 : !secret.secret<i16>
}

Output:

func.func @main(%secretInput: !secret.secret<i16>) -> !secret.secret<i16> {
  %c100 = arith.constant 100 : i16
  %0 = secret.generic(%secretInput : !secret.secret<i16>) {
  ^bb0(%input: i16):
    %1 = affine.for 0 to 16 iter_args(%arg1 = %input) -> (i16) {
      %2 = arith.cmpi sgt, %arg1, %c100 : i16
      %3 = scf.if (%2) -> i16{
        %4 = arith.muli %arg1, %arg1 : i16
        scf.yield %4 : i16
      } else {
        scf.yield %arg1 : i16
      }
      affine.yield %3 : i16
    } attributes {max_iter = 16 : i64}
    secret.yield %1 : i16
  } -> !secret.secret<i16>
  return %0 : !secret.secret<i16>
}

`-convert-tensor-to-scalars`

Effectively ‘unrolls’ tensors of static shape to scalars.

This pass will convert a static-shaped tensor type to a TypeRange containing product(dim) copies of the element type of the tensor. This pass currently includes two patterns:

It converts tensor.from_elements operations to the corresponding scalar inputs.
It converts tensor.insert operations by updating the ValueRange corresponding to the converted input and updating it with the scalar to be inserted.

It also applies folders greedily to simplify, e.g., extract(from_elements).

Note: The pass is designed to be run on an IR, where the only operations with tensor typed operands are tensor “management” operations such as insert/extract, with all other operations (e.g., arith operations) already taking (extracted) scalar inputs. For example, an IR where elementwise operations have been converted to scalar operations via --convert-elementwise-to-affine.

The pass might insert new tensor.from_elements operations or manually create the scalar ValueRange via inserting tensor.extract operations if any operations remain that operate on tensors. The pass currently applies irrespective of tensor size, i.e., might be very slow for large tensors.

TODO (#1023): Extend this pass to support more tensor operations, e.g., tensor.slice

Options

-max-size : Limits `unrolling` to tensors with at most max-size elements

`-convert-to-ciphertext-semantics`

Converts programs with tensor semantics to ciphertext semantics

This pass performs two inherently intertwined transformations:

Convert a program from tensor semantics to ciphertext semantics, explained below.
Implement ops defined on tensor-semantic types in terms of ops defined on ciphertext-semantic types.

A program is defined to have tensor semantics if the tensor-typed values are manipulated according to standard MLIR tensor operations and semantics.

A program is defined to have ciphertext semantics if the tensor-typed values correspond to tensors of FHE ciphertexts, where the last dimension of the tensor type is the number of ciphertext slots.

For example, a tensor of type tensor<32x32xi16> with tensor semantics might be converted by this pass, depending on the pass options, to a single ciphertext-semantics tensor<65536xi16>. A larger tensor might, depending on the layout chosen by earlier passes, be converted to a tensor<4x32768xi16>, where the trailing dimension corresponds to the number of slots in the ciphertext.

Tensors with ciphertext semantics can be thought of as an intermediate step between lowering from tensor types with tensor semantics to concrete lwe dialect ciphertext types in a particular FHE scheme. Having this intermediate step is useful because some optimizations are easier to implement, and can be implemented more generically, in the abstract FHE computational model where the data types are large tensors, and the operations are SIMD additions, multiplications, and cyclic rotations.

Function arguments and return values are annotated with the original tensor type in the secret.original_type attribute. This enables later lowerings to implement appropriate encoding and decoding routines for FHE schemes.

The second role of this pass is to implement FHE kernels for various high-level tensor operations, such as linalg.matvec. This must happen at the same time as the type conversion because the high-level ops like linalg.matvec are not well-defined on ciphertext-semantic tensors, while their implementation as SIMD/rotation ops are not well-defined on tensor-semantic tensors.

TODO(#1541): provide example docs

Options

-ciphertext-size : Power of two length of the ciphertexts the data is packed in.

`-drop-unit-dims`

Drops unit dimensions from linalg ops.

This pass converts linalg whose operands have unit dimensions in their types to specialized ops that drop these unit dimensions.

For example, a linalg.matmul whose RHS has type tensor<32x1xi32> is converted to a linalg.matvec op on the underlying tensor<32xi32>.

`-expand-copy`

Expands memref.copy ops to explicit affine loads and stores

This pass removes memref copy operations by expanding them to affine loads and stores. This pass introduces affine loops over the dimensions of the MemRef, so must be run prior to any affine loop unrolling in a pipeline.

Input

module {
  func.func @memref_copy() {
    %alloc = memref.alloc() : memref<2x3xi32>
    %alloc_0 = memref.alloc() : memref<2x3xi32>
    memref.copy %alloc, %alloc_0 : memref<1x1xi32> to memref<1x1xi32>
  }
}

Output

module {
  func.func @memref_copy() {
    %alloc = memref.alloc() : memref<2x3xi32>
    %alloc_0 = memref.alloc() : memref<2x3xi32>
    affine.for %arg0 = 0 to 2 {
      affine.for %arg1 = 0 to 3 {
        %1 = affine.load %alloc[%arg0, %arg1] : memref<2x3xi32>
        affine.store %1, %alloc_0[%arg0, %arg1] : memref<2x3xi32>
      }
    }
  }
}

When --disable-affine-loop=true is set, then the output becomes

module {
  func.func @memref_copy() {
    %alloc = memref.alloc() : memref<2x3xi32>
    %alloc_0 = memref.alloc() : memref<2x3xi32>
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %0 = affine.load %alloc[%c0, %c0] : memref<2x3xi32>
    affine.store %0, %alloc_0[%c0, %c0] : memref<2x3xi32>
    %1 = affine.load %alloc[%c0, %c1] : memref<2x3xi32>
    affine.store %1, %alloc_0[%c0, %c1] : memref<2x3xi32>
    %2 = affine.load %alloc[%c0, %c2] : memref<2x3xi32>
    affine.store %2, %alloc_0[%c0, %c2] : memref<2x3xi32>
    [...]
  }
}

Options

-disable-affine-loop : Use this to control to disable using affine loops

`-extract-loop-body`

Extracts logic of a loop bodies into functions.

This pass extracts logic in the inner body of for loops into functions.

This pass requires that tensors are lowered to memref. It expects that a loop body contains a number of affine.load statements used as inputs to the extracted function, and a single affine.store used as the extracted function’s output.

Input

module {
  func.func @loop_body() {
    %c-128_i8 = arith.constant -128 : i8
    %c127_i8 = arith.constant 127 : i8
    %alloc_7 = memref.alloc() {alignment = 64 : i64} : memref<25x20x8xi8>
    affine.for %arg1 = 0 to 25 {
      affine.for %arg2 = 0 to 20 {
        affine.for %arg3 = 0 to 8 {
          %98 = affine.load %alloc_6[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
          %99 = arith.cmpi slt, %arg0, %c-128_i8 : i8
          %100 = arith.select %99, %c-128_i8, %arg0 : i8
          %101 = arith.cmpi sgt, %arg0, %c127_i8 : i8
          %102 = arith.select %101, %c127_i8, %100 : i8
          affine.store %102, %alloc_7[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
        }
      }
    }
  }
}

Output

module {
  func.func @loop_body() {
    %alloc_7 = memref.alloc() {alignment = 64 : i64} : memref<25x20x8xi8>
    affine.for %arg1 = 0 to 25 {
      affine.for %arg2 = 0 to 20 {
        affine.for %arg3 = 0 to 8 {
          %98 = affine.load %alloc_6[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
          %102 = func.call @__for_loop(%98) : (i8) -> i8
          affine.store %102, %alloc_7[%arg1, %arg2, %arg3] : memref<25x20x8xi8>
        }
      }
    }
  }
  func.func private @__for_loop(%arg0: i8) -> i8 {
    %c-128_i8 = arith.constant -128 : i8
    %c127_i8 = arith.constant 127 : i8
    %99 = arith.cmpi slt, %arg0, %c-128_i8 : i8
    %100 = arith.select %99, %c-128_i8, %arg0 : i8
    %101 = arith.cmpi sgt, %arg0, %c127_i8 : i8
    %102 = arith.select %101, %c127_i8, %100 : i8
    return %102 : i8
  }
}

Options

-min-loop-size : Use this to control the minimum loop size to apply this pass
-min-body-size : Use this to control the minimum loop body size to apply this pass

`-fold-constant-tensors`

This pass folds any constant tensors.

This pass folds tensor operations on constants to new constants.

The following folders are supported:

tensor.insert of a constant tensor
tensor.collapse_shape of a constant tensor

`-fold-convert-layout-into-assign-layout`

Merges tensor_ext.convert_layout ops into preceding tensor_ext.assign_layout ops

A tensor_ext.assign_layout op corresponds to an encoding of a cleartext into a plaintext or ciphertext. If this is immediately followed by a tensor_ext.convert_layout op, then one can just change the initial encoding to correspond to the result of the conversion.

If the result of an assign_layout has multiple subsequent convert_layout ops, then they are folded into multiple assign_layout ops applied to the same cleartext.

`-forward-insert-to-extract`

Forward inserts to extracts within a single block

This pass is similar to forward-store-to-load pass where store ops are forwarded load ops; here instead tensor.insert ops are forwarded to tensor.extract ops.

Does not support complex control flow within a block, nor ops with arbitrary subregions.

`-forward-store-to-load`

Forward stores to loads within a single block

This pass is a simplified version of mem2reg and similar passes. It analyzes an operation, finding all basic blocks within that op that have memrefs whose stores can be forwarded to loads.

Does not support complex control flow within a block, nor ops with arbitrary subregions.

`-full-loop-unroll`

Fully unroll all loops

Scan the IR for affine.for loops and unroll them all.

`-generate-param-bfv`

Generate BFV Scheme Parameter

The pass generates the BFV scheme parameter using a given noise model.

There are four noise models available:

bfv-noise-by-bound-coeff-average-case
bfv-noise-by-bound-coeff-worst-case or bfv-noise-kpz21
bfv-noise-by-variance-coeff or bfv-noise-bmcm23
bfv-noise-canon-emb

To use public-key encryption/secret-key encryption in the model, the option usePublicKey could be set accordingly.

The first two models are taken from KPZ21, and they work by bounding the coefficient embedding of the ciphertexts. The difference of the two models is expansion factor used for multiplication of the coefficients, the first being $2 \sqrt{N}$ and the second being $N$.

The third model is taken from BMCM23. It works by tracking the variance of the coefficient embedding of the ciphertexts. This gives a much tighter noise estimate for independent ciphertext input, but may give underestimation for dependent ciphertext input. See the paper for more details.

The last model is adapted from MMLGA22 with mixes from BMCM23 and KPZ21. It uses the canonical embedding to bound the critical quantity of a ciphertext that defines whether c can be decrypted correctly.

This pass then generates the moduli chain consisting of primes of bits specified by the mod-bits field.

Usually for B/FV mod-bits is set to 60. But when machine word size is small, users may also want to set it to 57.

This pass relies on the presence of the mgmt dialect ops to model relinearize, and it relies on mgmt.mgmt attribute to determine the ciphertext level/dimension. These ops and attributes can be added by a pass like --secret-insert-mgmt-bgv and --annotate-mgmt.

User can provide custom scheme parameters by annotating bgv::SchemeParamAttr at the module level. Note that we reuse bgv::SchemeParamAttr for BFV.

Example

Command: heir-opt --generate-param-bfv tests/Transforms/generate_param_bfv/doctest.mlir

Input:

module {
  func.func @add(%arg0: !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) {
    %0 = secret.generic(%arg0: !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) attrs = {arg0 = {mgmt.mgmt = #mgmt.mgmt<level = 0>}} {
    ^body(%input0: i16):
      %1 = arith.addi %input0, %input0 {mgmt.mgmt = #mgmt.mgmt<level = 0>} : i16
      secret.yield %1 : i16
    } -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>})
    return %0 : !secret.secret<i16>
  }
}

Output:

module attributes {bgv.schemeParam = #bgv.scheme_param<logN = 12, Q = [2147565569], P = [2147573761], plaintextModulus = 65537>} {
  func.func @add(%arg0: !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) {
    %0 = secret.generic(%arg0: !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) attrs = {arg0 = {mgmt.mgmt = #mgmt.mgmt<level = 0>}} {
    ^body(%input0: i16):
      %1 = arith.addi %input0, %input0 {mgmt.mgmt = #mgmt.mgmt<level = 0>} : i16
      secret.yield %1 : i16
    } -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>})
    return %0 : !secret.secret<i16>
  }
}

Options

-model                         : Noise model to validate against.
-mod-bits                      : Default number of bits for all prime coefficient modulusto use for the ciphertext space.
-slot-number                   : Minimum number of slots for parameter generation.
-plaintext-modulus             : Plaintext modulus.
-use-public-key                : If true, uses a public key for encryption.
-encryption-technique-extended : If true, uses EXTENDED encryption technique for encryption. (See https://ia.cr/2022/915)

`-generate-param-bgv`

Generate BGV Scheme Parameter using a given noise model

The pass generates the BGV scheme parameter using a given noise model.

There are four noise models available:

bgv-noise-by-bound-coeff-average-case or bgv-noise-kpz21
bgv-noise-by-bound-coeff-worst-case
bgv-noise-by-variance-coeff or bgv-noise-mp24
bgv-noise-mono

To use public-key encryption/secret-key encryption in the model, the option usePublicKey could be set accordingly.

The third model is taken from MP24. It works by tracking the variance of the coefficient embedding of the ciphertexts. This gives a more accurate noise estimate, but it may give underestimates in some cases. See the paper for more details.

The last model is taken from MMLGA22. It uses the canonical embedding to bound the critical quantity of a ciphertext that defines whether c can be decrypted correctly. According to the authors they achieve more accurate and better bounds than KPZ21. See the paper for more details.

This pass relies on the presence of the mgmt dialect ops to model relinearize/modreduce, and it relies on mgmt.mgmt attribute to determine the ciphertext level/dimension. These ops and attributes can be added by a pass like --secret-insert-mgmt-bgv.

User can provide custom scheme parameters by annotating bgv::SchemeParamAttr at the module level.

Example

Command: heir-opt --generate-param-bgv tests/Transforms/generate_param_bgv/doctest.mlir

Input:

module {
  func.func @add(%arg0: !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) {
    %0 = secret.generic(%arg0: !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) attrs = {arg0 = {mgmt.mgmt = #mgmt.mgmt<level = 0>}} {
    ^body(%input0: i16):
      %1 = arith.addi %input0, %input0 {mgmt.mgmt = #mgmt.mgmt<level = 0>} : i16
      secret.yield %1 : i16
    } -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>})
    return %0 : !secret.secret<i16>
  }
}

Output:

module attributes {bgv.schemeParam = #bgv.scheme_param<logN = 12, Q = [4294991873], P = [4295049217], plaintextModulus = 65537>} {
  func.func @add(%arg0: !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) {
    %0 = secret.generic(%arg0: !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) attrs = {arg0 = {mgmt.mgmt = #mgmt.mgmt<level = 0>}} {
    ^body(%input0: i16):
      %1 = arith.addi %input0, %input0 {mgmt.mgmt = #mgmt.mgmt<level = 0>} : i16
      secret.yield %1 : i16
    } -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>})
    return %0 : !secret.secret<i16>
  }
}

Options

-model                         : Noise model to validate against.
-plaintext-modulus             : Plaintext modulus.
-slot-number                   : Minimum number of slots for parameter generation.
-use-public-key                : If true, uses a public key for encryption.
-encryption-technique-extended : If true, uses EXTENDED encryption technique for encryption. (See https://ia.cr/2022/915)

`-generate-param-ckks`

Generate CKKS Scheme Parameter

The pass generates the CKKS scheme parameter.

The pass asks the user to provide the number of bits for the first modulus and scaling modulus. The default values are 55 and 45, respectively. Then the pass generates the moduli chain using the provided values.

User can provide custom scheme parameters by annotating bgv::SchemeParamAttr at the module level.

Example

Command: heir-opt --generate-param-ckks tests/Transforms/generate_param_ckks/doctest.mlir

Input:

module {
  func.func @add(%arg0: !secret.secret<f16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) -> (!secret.secret<f16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) {
    %0 = secret.generic(%arg0: !secret.secret<f16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) attrs = {arg0 = {mgmt.mgmt = #mgmt.mgmt<level = 0>}} {
    ^body(%input0: f16):
      %1 = arith.addf %input0, %input0 {mgmt.mgmt = #mgmt.mgmt<level = 0>} : f16
      secret.yield %1 : f16
    } -> (!secret.secret<f16> {mgmt.mgmt = #mgmt.mgmt<level = 0>})
    return %0 : !secret.secret<f16>
  }
}

Output:

module attributes {ckks.schemeParam = #ckks.scheme_param<logN = 13, Q = [36028797019389953], P = [36028797019488257], logDefaultScale = 45>} {
  func.func @add(%arg0: !secret.secret<f16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) -> (!secret.secret<f16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) {
    %0 = secret.generic(%arg0: !secret.secret<f16> {mgmt.mgmt = #mgmt.mgmt<level = 0>}) attrs = {arg0 = {mgmt.mgmt = #mgmt.mgmt<level = 0>}} {
    ^body(%input0: f16):
      %1 = arith.addf %input0, %input0 {mgmt.mgmt = #mgmt.mgmt<level = 0>} : f16
      secret.yield %1 : f16
    } -> (!secret.secret<f16> {mgmt.mgmt = #mgmt.mgmt<level = 0>})
    return %0 : !secret.secret<f16>
  }
}

Options

-slot-number                   : Minimum number of slots for parameter generation.
-first-mod-bits                : Default number of bits of the first prime coefficient modulus to use for the ciphertext space.
-scaling-mod-bits              : Default number of bits of the scaling prime coefficient modulus to use for the ciphertext space.
-use-public-key                : If true, uses a public key for encryption.
-encryption-technique-extended : If true, uses EXTENDED encryption technique for encryption. (See https://ia.cr/2022/915)
-input-range                   : The range of the plaintexts for input ciphertexts for the CKKS scheme; default to [-1, 1]. For other ranges like [-D, D], use D.

`-implement-shift-network`

Implement tensor_ext.convert_layout ops as shift newtorks

This pass converts tensor_ext.permute ops into a network of tensor_ext.rotate ops, aiming to minimize the overall latency of the permutation.

The input IR must have tensors that correspond to plaintexts or ciphertexts.

The method uses graph coloring, an approach based on Vos-Vos-Erkin 2022, “Efficient Circuits for Permuting and Mapping Packed Values Across Leveled Homomorphic Ciphertexts”.

Example, Figure 3 from the paper above:

// Provide an explicit permutation, though an affine_map can also be used.
#map = dense<[13, 8, 4, 0, 11, 7, 14, 5, 15, 3, 12, 6, 10, 2, 9, 1]> : tensor<16xi64>
func.func @figure3(%0: tensor<16xi32>) -> tensor<16xi32> {
  %1 = tensor_ext.permute %0 {permutation = #map} : tensor<16xi32>
  return %1 : tensor<16xi32>
}

Then running --implement-shift-network=ciphertext-size=16 produces a shift network composed of plaintext-ciphertext masks (arith.constant + arith.muli) followed by rotations and additions. The Vos-Vos-Erkin method splits the work into multiple independent groups that are added together at the end.

func.func @figure3(%arg0: tensor<16xi32>) -> tensor<16xi32> {
  %cst = arith.constant dense<[1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0]> : tensor<16xi32>
  %0 = arith.muli %arg0, %cst : tensor<16xi32>
  %c1_i32 = arith.constant 1 : i32
  %1 = tensor_ext.rotate %0, %c1_i32 : tensor<16xi32>, i32
  %cst_0 = arith.constant dense<[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]> : tensor<16xi32>
  %2 = arith.muli %arg0, %cst_0 : tensor<16xi32>
  %c2_i32 = arith.constant 2 : i32
  %3 = tensor_ext.rotate %2, %c2_i32 : tensor<16xi32>, i32
  %4 = arith.addi %1, %3 : tensor<16xi32>
  %cst_1 = arith.constant dense<[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]> : tensor<16xi32>
  %5 = arith.muli %arg0, %cst_1 : tensor<16xi32>
  %c4_i32 = arith.constant 4 : i32
  %6 = tensor_ext.rotate %5, %c4_i32 : tensor<16xi32>, i32
  %7 = arith.addi %4, %6 : tensor<16xi32>
  %cst_2 = arith.constant dense<[0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]> : tensor<16xi32>
  %8 = arith.muli %arg0, %cst_2 : tensor<16xi32>
  %c8_i32 = arith.constant 8 : i32
  %9 = tensor_ext.rotate %8, %c8_i32 : tensor<16xi32>, i32
  %10 = arith.addi %7, %9 : tensor<16xi32>
  %cst_3 = arith.constant dense<[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]> : tensor<16xi32>
  %11 = arith.muli %arg0, %cst_3 : tensor<16xi32>
  %c1_i32_4 = arith.constant 1 : i32
  %12 = tensor_ext.rotate %11, %c1_i32_4 : tensor<16xi32>, i32
  %cst_5 = arith.constant dense<[0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0]> : tensor<16xi32>
  %13 = arith.muli %arg0, %cst_5 : tensor<16xi32>
  %c2_i32_6 = arith.constant 2 : i32
  %14 = tensor_ext.rotate %13, %c2_i32_6 : tensor<16xi32>, i32
  %15 = arith.addi %12, %14 : tensor<16xi32>
  %cst_7 = arith.constant dense<[0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0]> : tensor<16xi32>
  %16 = arith.muli %arg0, %cst_7 : tensor<16xi32>
  %c4_i32_8 = arith.constant 4 : i32
  %17 = tensor_ext.rotate %16, %c4_i32_8 : tensor<16xi32>, i32
  %18 = arith.addi %15, %17 : tensor<16xi32>
  %cst_9 = arith.constant dense<[0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0]> : tensor<16xi32>
  %19 = arith.muli %arg0, %cst_9 : tensor<16xi32>
  %c8_i32_10 = arith.constant 8 : i32
  %20 = tensor_ext.rotate %19, %c8_i32_10 : tensor<16xi32>, i32
  %21 = arith.addi %18, %20 : tensor<16xi32>
  %22 = arith.addi %10, %21 : tensor<16xi32>
  %cst_11 = arith.constant dense<[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]> : tensor<16xi32>
  %23 = arith.muli %arg0, %cst_11 : tensor<16xi32>
  %c1_i32_12 = arith.constant 1 : i32
  %24 = tensor_ext.rotate %23, %c1_i32_12 : tensor<16xi32>, i32
  %cst_13 = arith.constant dense<[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1]> : tensor<16xi32>
  %25 = arith.muli %arg0, %cst_13 : tensor<16xi32>
  %c2_i32_14 = arith.constant 2 : i32
  %26 = tensor_ext.rotate %25, %c2_i32_14 : tensor<16xi32>, i32
  %27 = arith.addi %24, %26 : tensor<16xi32>
  %cst_15 = arith.constant dense<[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]> : tensor<16xi32>
  %28 = arith.muli %arg0, %cst_15 : tensor<16xi32>
  %c4_i32_16 = arith.constant 4 : i32
  %29 = tensor_ext.rotate %28, %c4_i32_16 : tensor<16xi32>, i32
  %30 = arith.addi %27, %29 : tensor<16xi32>
  %cst_17 = arith.constant dense<[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1]> : tensor<16xi32>
  %31 = arith.muli %arg0, %cst_17 : tensor<16xi32>
  %c8_i32_18 = arith.constant 8 : i32
  %32 = tensor_ext.rotate %31, %c8_i32_18 : tensor<16xi32>, i32
  %33 = arith.addi %30, %32 : tensor<16xi32>
  %34 = arith.addi %22, %33 : tensor<16xi32>
  return %34 : tensor<16xi32>
}

Options

-ciphertext-size : Power of two length of the ciphertexts the data is packed in.

`-insert-rotate`

Vectorize arithmetic FHE operations using HECO-style heuristics

This pass implements the SIMD-vectorization passes from the HECO paper.

The pass operates by identifying arithmetic operations that can be suitably combined into a combination of cyclic rotations and vectorized operations on tensors. It further identifies a suitable “slot target” for each operation and heuristically aligns the operations to reduce unnecessary rotations.

This pass by itself does not eliminate any operations, but instead inserts well-chosen rotations so that, for well-structured code (like unrolled affine loops), a subsequent --cse and --canonicalize pass will dramatically reduce the IR. As such, the pass is designed to be paired with the canonicalization patterns in tensor_ext, as well as the collapse-insertion-chains pass, which cleans up remaining insertion and extraction ops after the main simplifications are applied.

Unlike HECO, this pass operates on plaintext types and tensors, along with the HEIR-specific tensor_ext dialect for its cyclic rotate op. It is intended to be run before lowering to a scheme dialect like bgv.

`-lattigo-alloc-to-inplace`

Convert AllocOps to InplaceOps in Lattigo

This pass converts AllocOps to InplaceOps in Lattigo.

`-lattigo-configure-crypto-context`

Configure the crypto context in Lattigo

This pass generates helper functions to configure the Lattigo objects for the given function.

For example, for an MLIR function @my_func, the generated helpers have the following signatures

func.func @my_func__configure() -> (!lattigo.bgv.evaluator, !lattigo.bgv.parameter, !lattigo.bgv.encoder, !lattigo.rlwe.encryptor, !lattigo.rlwe.decryptor)

Options

-entry-function : Default entry function name of entry function.

`-layout-optimization`

Optimize layout conversions in the IR

This pass performance a greedy layout optimization pass similar to the automatic layout assignment from A Tensor Compiler with Automatic Data Packing for Simple and Efficient Fully Homomorphic Encryption. The pass assumes that an initial layout assignment was provided on each operation through the layout-propagation pass.

The pass iterates on each operation of the IR in reverse order, attempting to hoist a layout conversion of the operation’s result before the operation. For each of the result’s layout conversions, the pass will compute the net cost of hoisting the conversion through the operation by considering the following:

The cost of performing the operation with new input layouts that result in the desired layout.
The cost of the converting the layout of each input.
The new cost of converting from the desired layout to each other layout conversions of the result.

The layout conversion that results in the lowest net cost is chosen to be hoisted.

Examples:

The second layout conversion could be eliminated by performing the first addition operation under #map1.

!tensor = tensor<32xi16>
!stensor = !secret.secret<!tensor>

#map = affine_map<(d0) -> (d0 + 1 mod 32)>
#map1 = affine_map<(d0) -> (d0)>
module {
  func.func @push_conversion(%arg0: !stensor {tensor_ext.layout = #map}, %arg1: !stensor {tensor_ext.layout = #map1}, %arg2: !stensor {tensor_ext.layout = #map1}) -> (!stensor {tensor_ext.layout = #map}) {
    %0 = secret.generic(%arg0 : !stensor {tensor_ext.layout = #map}, %arg1 : !stensor {tensor_ext.layout = #map1}, %arg2 : !stensor {tensor_ext.layout = #map1}) {
    ^body(%input0: tensor<32xi16>, %input1: tensor<32xi16>, %input2: tensor<32xi16>):
      %1 = tensor_ext.convert_layout %input1 {from_layout = #map1, tensor_ext.layout = [#map], to_layout = #map} : tensor<32xi16>
      %2 = arith.addi %input0, %1 {tensor_ext.layout = #map} : tensor<32xi16>
      %3 = tensor_ext.convert_layout %2 {from_layout = #map, tensor_ext.layout = [#map1], to_layout = #map1} : tensor<32xi16>
      %4 = arith.addi %3, %input2 {tensor_ext.layout = #map1} : tensor<32xi16>
      secret.yield %4 : tensor<32xi16>
    } -> (!stensor {tensor_ext.layout = #map1})
    return %0 : !stensor
  }
}

This pass produces:

!tensor = tensor<32xi16>
!stensor = !secret.secret<!tensor>

#map = affine_map<(d0) -> (d0 + 1)>
#map1 = affine_map<(d0) -> (d0)>
module {
  func.func @push_conversion(%arg0: !stensor {tensor_ext.layout = #map1}, %arg1: !stensor {tensor_ext.layout = #map}, %arg2: !stensor {tensor_ext.layout = #map1}) -> (!stensor {tensor_ext.layout = #map}) {
    %0 = secret.generic(%arg0 : !stensor {tensor_ext.layout = #map}, %arg1 : !stensor {tensor_ext.layout = #map1}, %arg2 : !stensor {tensor_ext.layout = #map1}) {
    ^body(%input0: tensor<32xi16>, %input1: tensor<32xi16>, %input2: tensor<32xi16>):
      %1 = tensor_ext.convert_layout %input0 {from_layout = #map, tensor_ext.layout = #map1, to_layout = #map1} : tensor<32xi16>
      %2 = arith.addi %1, %input1 {tensor_ext.layout = #map1} : tensor<32xi16>
      %3 = arith.addi %2, %input2 {tensor_ext.layout = #map1} : tensor<32xi16>
      secret.yield %3 : tensor<32xi16>
    } -> (!stensor {tensor_ext.layout = #map1})
    return %0 : !stensor
  }
}

Options

-ciphertext-size : Power of two length of the ciphertexts the data is packed in.

`-layout-propagation`

Propagate ciphertext layouts through the IR

This pass performs a forward propagation of layout (packing) information through the input IR, starting from the assumption that each secret tensor argument to a function has a row-major layout.

The chosen layouts (affine_maps) are annotated on ops throughout the IR. In particular,

Ops with a nested region and block arguments use a dictionary attribute to mark the layout of each block argument. func.func in particular uses the tensor_ext.layout dialect attribute, while others use an affine map attribute.
Other ops annotate their results with layouts as an ArrayAttr of affine maps. The order of the affine maps corresponds to the order of results.

When a plaintext SSA value is encountered as an input to a secret operation, a tensor_ext.assign_layout op is inserted that assigns it a default layout. This semantically corresponds to a plaintext packing operation. This is performed as late as possible before the SSA value is used, to avoid unnecessary layout conversions of plaintexts. This implies that not all SSA values in the IR are annotated with layouts, only those that have secret results or secret operands.

When two incompatible layouts are encountered as operands to the same op, tensor_ext.convert_layout ops are inserted. For example, consider the linalg.reduce operation for a summation. Summing along each of the two axes of a row-major-packed tensor<32x32xi16> results in two tensor<32xi16>, but with incompatible layouts: the first has a compact layout residing in the first 32-entries of a ciphertext, while the second is a strided layout with a stride of 32.

The converted op is arbitrarily chosen to have the layout of the first input, and later passes are responsible for optimizing the choice of which operand is converted and where the conversion operations are placed. This separation of duties allows this pass to be reused as a pure dataflow analysis, in which case it annotates an un-annotated IR with layout attributes.

Examples:

Two incompatible summations require a layout conversion

!tensor = tensor<32x32xi16>
!tensor2 = tensor<32xi16>
!stensor = !secret.secret<!tensor>
!stensor2 = !secret.secret<!tensor2>

func.func @insert_conversion(%arg0: !stensor, %arg1: !stensor) -> !stensor2 {
  %out_1 = arith.constant dense<0> : !tensor2
  %out_2 = arith.constant dense<0> : !tensor2

  %0 = secret.generic(%arg0, %arg1: !stensor, !stensor) {
  ^body(%pt_arg0: !tensor, %pt_arg1: !tensor):
    %1 = linalg.reduce { arith.addi } ins(%pt_arg0:!tensor) outs(%out_1:!tensor2) dimensions = [0]
    %2 = linalg.reduce { arith.addi } ins(%pt_arg1:!tensor) outs(%out_2:!tensor2) dimensions = [1]
    %3 = arith.addi %1, %2 : !tensor2
    secret.yield %3 : !tensor2
  } -> !stensor2
  return %0 : !stensor2
}

This pass produces:

#map = affine_map<(d0, d1) -> (d0 * 32 + d1)>
#map1 = affine_map<(d0) -> (d0)>
#map2 = affine_map<(d0) -> (d0 * 32)>
module {
  func.func @insert_conversion(
        %arg0: !secret.secret<tensor<32x32xi16>> {
            tensor_ext.layout = #tensor_ext.layout<layout = (d0, d1) -> (d0 * 32 + d1)>},
        %arg1: !secret.secret<tensor<32x32xi16>> {
            tensor_ext.layout = #tensor_ext.layout<layout = (d0, d1) -> (d0 * 32 + d1)>}
      ) -> (!secret.secret<tensor<32xi16>> {tensor_ext.layout = #tensor_ext.layout<layout = (d0) -> (d0)>}) {
    %cst = arith.constant dense<0> : tensor<32xi16>
    %cst_0 = arith.constant dense<0> : tensor<32xi16>
    %0 = secret.generic(%arg0, %arg1 : !secret.secret<tensor<32x32xi16>>, !secret.secret<tensor<32x32xi16>>)
                        attrs = {arg0 = {tensor_ext.layout = #map}, arg1 = {tensor_ext.layout = #map}, layout = [#map1]} {
    ^body(%input0: tensor<32x32xi16>, %input1: tensor<32x32xi16>):
      %1 = tensor_ext.assign_layout %cst {tensor_ext.layout = #map1} : tensor<32xi16>
      %reduced = linalg.reduce { arith.addi {overflowFlags = #arith.overflow<none>} }
                  ins(%input0 : tensor<32x32xi16>)
                  outs(%1 : tensor<32xi16>)
                  dimensions = [0]  {tensor_ext.layout = [#map1]}

      %2 = tensor_ext.assign_layout %cst_0 {tensor_ext.layout = #map1} : tensor<32xi16>
      %3 = tensor_ext.convert_layout %2 {from_layout = #map1, layout = [#map2], to_layout = #map2} : tensor<32xi16>
      %reduced_1 = linalg.reduce { arith.addi {overflowFlags = #arith.overflow<none>} }
                  ins(%input1 : tensor<32x32xi16>)
                  outs(%3 : tensor<32xi16>)
                  dimensions = [1]  {tensor_ext.layout = [#map2]}

      %4 = tensor_ext.convert_layout %reduced_1 {from_layout = #map2, layout = [#map1], to_layout = #map1} : tensor<32xi16>
      %5 = arith.addi %reduced, %4 {tensor_ext.layout = [#map1]} : tensor<32xi16>
      secret.yield %5 : tensor<32xi16>
    } -> !secret.secret<tensor<32xi16>>
    return %0 : !secret.secret<tensor<32xi16>>
  }
}

Options

-ciphertext-size : Power of two length of the ciphertexts the data is packed in.

`-linalg-canonicalizations`

This pass canonicalizes the linalg.transpose operation of a constant into a transposed constant.

This pass canonicalizes the linalg.transpose operation of a constant into a transposed constant.

`-lower-polynomial-eval`

Lowers the polynomial.eval operation

This pass lowers the polynomial.eval operation to a sequence of arithmetic operations in the relevant dialect.

Dialects that wish to support this pass must implement the DialectPolynomialEvalInterface dialect interface, which informs this pass what operations in the target dialect correspond to scalar multiplication and addition, as well as how to properly materialize constants as values.

This pass supports multiple options for lowering a polynomial.eval op, including the following. The required basis representation of the polynomial is listed alongside each method. The chosen method is controlled by the method pass option, which defaults to automatically select the method.

"horner": Horner’s method (monomial basis)
"ps" Paterson-Stockmeyer (monomial basis)
"pscheb": Paterson-Stockmeyer (Chebyshev basis)

// TODO(#1565): Add support for Chebyshev-basis methods // - "clenshaw": Clenshaw’s method (Chebyshev basis) // - "bsgs": Baby Step Giant Step (Chebyshev basis)

Options

-method : The method used to lower polynomial.eval

`-lower-unpack`

Lower tensor_ext.unpack to standard MLIR

This pass lowers tensor_ext.unpack.

`-lwe-add-debug-port`

Add debug port to (R)LWE encrypted functions

This pass adds debug ports to the specified function in the IR. The debug ports are prefixed with “__heir_debug” and are invoked after each homomorphic operation in the function. The debug ports are declarations and user should provide functions with the same name in their code.

For example, if the function is called “foo”, the secret key is added to its arguments, and the debug port is called after each homomorphic operation:

// declaration of external debug function
func.func private @__heir_debug(%sk : !sk, %ct : !ct)

// secret key added as function arg
func.func @foo(%sk : !sk, ...) {
  %ct = lwe.radd ...
  // invoke external debug function
  __heir_debug(%sk, %ct)
  %ct1 = lwe.rmul ...
  __heir_debug(%sk, %ct1)
  ...
}

Options

-entry-function : Default entry function name of entry function.

`-lwe-to-lattigo`

Lower lwe to lattigo dialect.

This pass lowers the lwe dialect to Lattigo dialect.

`-lwe-to-openfhe`

Lower lwe to openfhe dialect.

This pass lowers the lwe dialect to Openfhe dialect. Currently, this also includes patterns that apply directly to ckks and bgv dialect operations. TODO (#1193): investigate if the need for ckks/bgv patterns in --lwe-to-openfhe is permanent.

`-lwe-to-polynomial`

Lower lwe to polynomial dialect.

This pass lowers the lwe dialect to polynomial dialect.

`-memref-global-replace`

MemrefGlobalReplacePass forwards global memrefs accessors to arithmetic values

This pass forwards constant global MemRef values to referencing affine loads. This pass requires that the MemRef global values are initialized as constants and that the affine load access indices are constants (i.e. not variadic). Unroll affine loops prior to running this pass.

MemRef removal is required to remove any memory allocations from the input model (for example, TensorFlow models contain global memory holding model weights) to support FHE transpilation.

Input

module {
  memref.global "private" constant @__constant_8xi16 : memref<2x4xi16> = dense<[[-10, 20, 3, 4], [5, 6, 7, 8]]>
  func.func @main() -> i16 {
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %0 = memref.get_global @__constant_8xi16 : memref<2x4xi16>
    %1 = affine.load %0[%c1, %c1 + %c2] : memref<2x4xi16>
    return %1 : i16
  }
}

Output

module {
  func.func @main() -> i16 {
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %c8_i16 = arith.constant 8 : i16
    return %c8_i16 : i16
  }
}

`-mod-arith-to-arith`

Lower mod_arith to standard arith.

This pass lowers the mod_arith dialect to their arith equivalents.

`-mod-arith-to-mac`

Finds consecutive ModArith mul and add operations and converts them to a Mac operation

Walks over the programs to find Add operations, it checks if the any operands originates from a mul operation. If so, it converts the Add operation to a Mac operation and removes the mul operation.

`-openfhe-configure-crypto-context`

Configure the crypto context in OpenFHE

This pass generates helper functions to generate and configure the OpenFHE crypto context for the given function. Generating the crypto context sets the appropriate encryption parameters, while the configuration generates the necessary evaluation keys (relinearization and rotation keys).

For the options, reader can refer to the OpenFHE documentation at https://github.com/openfheorg/openfhe-development/blob/main/src/pke/examples/README.md#description-of-the-cryptocontext-parameters-and-their-restrictions

For example, for an MLIR function @my_func, the generated helpers have the following signatures

func.func  @my_func__generate_crypto_context() -> !openfhe.crypto_context

func.func  @my_func__configure_crypto_context(!openfhe.crypto_context, !openfhe.private_key) -> !openfhe.crypto_context

Options

-entry-function                 : Default entry function name of entry function.
-mul-depth                      : Manually specify the mul depth
-ring-dim                       : Manually specify the ring dimension (insecure is implied)
-batch-size                     : Manually specify the batch size
-first-mod-size                 : Manually specify the first mod size
-scaling-mod-size               : Manually specify the scaling mod size
-digit-size                     : Manually specify the digit size for relinearization
-num-large-digits               : Manually specify the number of large digits for HYBRID relinearization
-max-relin-sk-deg               : Manually specify the max number of relin sk deg
-insecure                       : Whether to use insecure parameter (defaults to false)
-key-switching-technique-bv     : Whether to use BV key switching technique (defaults to false)
-scaling-technique-fixed-manual : Whether to use fixed manual scaling technique (defaults to false)
-level-budget-encode            : Level budget for CKKS bootstrap encode (s2c) phase
-level-budget-decode            : Level budget for CKKS bootstrap decode (c2s) phase

`-openfhe-count-add-and-key-switch`

Count the number of add and key-switch operations in OpenFHE

This pass counts the number of add and key-switch operations in the given function.

This is used for setting the EvalAddCount and EvalKeySwitchCount in OpenFHE library. Cf. Alexandru et al. 2024 for why this is important for security.

The detailed definition of these counts could be found in the KPZ21 paper Revisiting Homomorphic Encryption Schemes for Finite Fields

The pass should be run at the secret arithmetic level when management operations have been inserted and the IR is stable.

`-operation-balancer`

This pass balances addition and multiplication operations.

This pass examines a tree or graph of add and multiplication operations and balances them to minimize the depth of the tree. This exposes better parallelization and reducing the multiplication depth can decrease the parameters used in FHE, which improves performance. This pass is not necessarily optimal, as there may be intermediate computations that this pass does not optimally minimize the depth for.

The algorithm is to analyze a graph of addition operations and do a depth-first search for the operands (from the last computed values in the graph). If there are intermediate computations that are used more than once, then the pass treats that computation as its own tree to balance instead of trying to minimize the global depth of the tree.

This pass only runs on addition and multiplication operations on the arithmetic dialect that are encapsulated inside a secret.generic.

This pass was inspired by section 2.6 of ‘EVA Improved: Compiler and Extension Library for CKKS’ by Chowdhary et al.

`-optimize-relinearization`

Optimize placement of relinearization ops

This pass defers relinearization ops as late as possible in the IR. This is more efficient in cases where multiplication operations are followed by additions, such as in a dot product. Because relinearization also adds error, deferring it can reduce the need for bootstrapping.

In this pass, we use an integer linear program to determine the optimal relinearization strategy. It solves an ILP for each func op in the IR.

The assumptions of this pass include:

All return values of functions must be linearized.
All ciphertext arguments to an op must have the same key basis
Rotation op inputs must have be linearized.

For an ILP model specification, see the docs at the HEIR website. The model is an adaptation of the ILP described in a blog post by Jeremy Kun.

Options

-use-loc-based-variable-names : When true, the ILP uses op source locations in variable names, which can help debug ILP model bugs.
-allow-mixed-degree-operands  : When true, allow ops to have mixed-degree ciphertexts as inputs, e.g., adding two ciphertexts with different key bases; this is supported by many FHE backends, like OpenFHE and Lattigo

`-polynomial-approximation`

Approximate ops by polynomials

This pass replaces certain operations that are incompatible with the FHE computational model with polynomial approximations.

The pass applies to the following ops in the math dialect. When the op is binary, the pass applies when one op is the result of an arith.constant which is scalar-valued or a splatted tensor.

absf
acos
acosh
asin
asinh
atan2
atan
atanh
cbrt
ceil
copysign
cos
cosh
erf
erfc
exp2
exp
expm1
floor
fpowi
log10
log1p
log2
log
powf
round
roundeven
rsqrt
sin
sinh
sqrt
tan
tanh
trunc

As well as the following ops in the math_ext dialect:

sign

The following ops in the arith dialect are also supported:

maxf
maxnumf
minf
minnumf

These ops are replaced with polynomial.eval ops with a static polynomial attribute.

Example

Command: heir-opt --polynomial-approximation tests/Transforms/polynomial_approximation/doctest.mlir

Input:

func.func @test_exp(%x: f32) -> f32 {
  %0 = math.exp %x {
      degree = 3 : i32,
      domain_lower = -1.0 : f64,
      domain_upper = 1.0 : f64} : f32
  return %0 : f32
}

Output:

#ring_f64 = #polynomial.ring<coefficientType = f64>
!poly = !polynomial.polynomial<ring = #ring_f64>
module {
  func.func @test_exp(%arg0: f32) -> f32 {
    %0 = polynomial.eval #polynomial<typed_float_polynomial <0.99457947632469512 + 0.9956677100276301x + 0.5429727883818608x**2 + 0.17953348361617388x**3> : !poly>, %arg0 {domain_lower = -1.000000e+00 : f64, domain_upper = 1.000000e+00 : f64} : f32
    return %0 : f32
  }
}

`-polynomial-to-mod-arith`

Lower polynomial to standard MLIR dialects.

This pass lowers the polynomial dialect to standard MLIR plus mod_arith, including possibly ops from affine, tensor, linalg, and arith.

`-populate-scale-bgv`

Populate the scale for BGV (GHS variant) ciphertext

In the original BGV scheme, it is required that each modulus in the modulus chain is a prime number q such that $q \equiv 1 \pmod{t}$, the plaintext modulus. This is to ensure that the after each modulus switching, the plaintext message is preserved. However, this limits the possible choices of the moduli chain.

In the GHS variant of BGV, such requirement is removed by introducing scaling factor to the ciphertext, with the cost of scale management. This pass is responsible for such management.

This pass relies on concrete SchemeParamAttr annotated on the module to determine the scale for each ciphertext. Such annotation can be generated by the generate-param-bgv pass.

Example

Command: heir-opt tests/Transforms/populate_scale/bgv/doctest.mlir --populate-scale-bgv

Input:

module attributes {bgv.schemeParam = #bgv.scheme_param<logN = 13, Q = [67239937, 8796093202433], P = [8796093349889], plaintextModulus = 65537>, scheme.bgv} {
  func.func @mul(%arg0: !secret.secret<i16>) -> !secret.secret<i16> {
    %0 = secret.generic(%arg0 : !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 1>}) {
    ^body(%input0: i16):
      %1 = arith.muli %input0, %input0 {mgmt.mgmt = #mgmt.mgmt<level = 1, dimension = 3>} : i16
      %2 = mgmt.relinearize %1 {mgmt.mgmt = #mgmt.mgmt<level = 1>} : i16
      %3 = mgmt.modreduce %2 {mgmt.mgmt = #mgmt.mgmt<level = 0>} : i16
      secret.yield %3 : i16
    } -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0>})
    return %0 : !secret.secret<i16>
  }
}

Output:

module attributes {bgv.schemeParam = #bgv.scheme_param<logN = 13, Q = [67239937, 8796093202433], P = [8796093349889], plaintextModulus = 65537>, scheme.bgv} {
  func.func @mul(%arg0: !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 1, scale = 1>}) -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0, scale = 42541>}) {
    %0 = secret.generic(%arg0: !secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 1, scale = 1>}) {
    ^body(%input0: i16):
      %1 = arith.muli %input0, %input0 {mgmt.mgmt = #mgmt.mgmt<level = 1, dimension = 3, scale = 1>} : i16
      %2 = mgmt.relinearize %1 {mgmt.mgmt = #mgmt.mgmt<level = 1, scale = 1>} : i16
      %3 = mgmt.modreduce %2 {mgmt.mgmt = #mgmt.mgmt<level = 0, scale = 42541>} : i16
      secret.yield %3 : i16
    } -> (!secret.secret<i16> {mgmt.mgmt = #mgmt.mgmt<level = 0, scale = 42541>})
    return %0 : !secret.secret<i16>
  }
}

`-populate-scale-ckks`

Populate the scale for CKKS ciphertext

In CKKS, each ciphertext is associated with a scaling factor $\Delta$, and such scaling factor will change after homomorphic operations such as multiplication and modulus reducing.

However, certain operations such as addition require the input ciphertexts to have the same scale. This pass is then responsible for managing the scale of the ciphertexts.

This pass relies on concrete SchemeParamAttr annotated on the module to determine the scale for each ciphertext. Such annotation can be generated by the generate-param-ckks pass.

The scaling factor is expressed in logarithm form.

Example

Command: heir-opt tests/Transforms/populate_scale/ckks/doctest.mlir --populate-scale-ckks

Input:

module attributes {ckks.schemeParam = #ckks.scheme_param<logN = 13, Q = [36028797019389953, 35184372121601], P = [36028797019488257], logDefaultScale = 45>, scheme.ckks} {
  func.func @mul(%arg0: !secret.secret<f32>) -> !secret.secret<f32> {
    %0 = secret.generic(%arg0 : !secret.secret<f32> {mgmt.mgmt = #mgmt.mgmt<level = 1, scale = 45>}) {
    ^body(%input0: f32):
      %1 = arith.mulf %input0, %input0 {mgmt.mgmt = #mgmt.mgmt<level = 1, dimension = 3, scale = 90>} : f32
      %2 = mgmt.relinearize %1 {mgmt.mgmt = #mgmt.mgmt<level = 1>} : f32
      %3 = mgmt.modreduce %2 {mgmt.mgmt = #mgmt.mgmt<level = 0>} : f32
      secret.yield %3 : f32
    } -> (!secret.secret<f32> {mgmt.mgmt = #mgmt.mgmt<level = 0>})
    return %0 : !secret.secret<f32>
  }
}

Output:

module attributes {ckks.schemeParam = #ckks.scheme_param<logN = 13, Q = [36028797019389953, 35184372121601], P = [36028797019488257], logDefaultScale = 45>, scheme.ckks} {
  func.func @mul(%arg0: !secret.secret<f32> {mgmt.mgmt = #mgmt.mgmt<level = 1, scale = 45>}) -> (!secret.secret<f32> {mgmt.mgmt = #mgmt.mgmt<level = 0, scale = 45>}) {
    %0 = secret.generic(%arg0: !secret.secret<f32> {mgmt.mgmt = #mgmt.mgmt<level = 1, scale = 45>}) {
    ^body(%input0: f32):
      %1 = arith.mulf %input0, %input0 {mgmt.mgmt = #mgmt.mgmt<level = 1, dimension = 3, scale = 90>} : f32
      %2 = mgmt.relinearize %1 {mgmt.mgmt = #mgmt.mgmt<level = 1, scale = 90>} : f32
      %3 = mgmt.modreduce %2 {mgmt.mgmt = #mgmt.mgmt<level = 0, scale = 45>} : f32
      secret.yield %3 : f32
    } -> (!secret.secret<f32> {mgmt.mgmt = #mgmt.mgmt<level = 0, scale = 45>})
    return %0 : !secret.secret<f32>
  }
}

Options

-before-mul-include-first-mul : Modulus switching before each multiplication, including the first multiplication (default to false)

`-propagate-annotation`

Propagate annotation from operation to subsequent operations

This pass propagates the attribute from one operation to subsequent operations if these operations does not have the attribute already.

Example: with --propagate-annotation=attr-name=test.attr

func.func @foo(%arg0: i16 {test.attr = 1}) -> i16 {
  %0 = arith.muli %arg0, %arg0 : i16
  %1 = mgmt.relinearize %0 : i16
  return %1 : i16
}

the above IR becomes

func.func @foo(%arg0: i16 {test.attr = 1 : i64}) -> i16 {
  %0 = arith.muli %arg0, %arg0 {test.attr = 1 : i64} : i16
  %1 = mgmt.relinearize %0 {test.attr = 1 : i64} : i16
  return {test.attr = 1 : i64} %1 : i16
}

Options

-attr-name : The attribute name to propagate with.
-reverse   : Whether to propagate in reverse

`-remove-unused-memref`

Cleanup any unused memrefs

Scan the IR for unused memrefs and remove them.

This pass looks for locally allocated memrefs that are never used and deletes them. This pass can be used as a cleanup pass from other IR simplifications that forward stores to loads.

`-rotate-and-reduce`

Use a logarithmic number of rotations to reduce a tensor.

This pass identifies when a commutative, associative binary operation is used to reduce all of the entries of a tensor to a single value, and optimizes the operations by using a logarithmic number of reduction operations.

In particular, this pass identifies an unrolled set of operations of the form (the binary ops may come in any order):

%0 = tensor.extract %t[0] : tensor<8xi32>
%1 = tensor.extract %t[1] : tensor<8xi32>
%2 = tensor.extract %t[2] : tensor<8xi32>
%3 = tensor.extract %t[3] : tensor<8xi32>
%4 = tensor.extract %t[4] : tensor<8xi32>
%5 = tensor.extract %t[5] : tensor<8xi32>
%6 = tensor.extract %t[6] : tensor<8xi32>
%7 = tensor.extract %t[7] : tensor<8xi32>
%8 = arith.addi %0, %1 : i32
%9 = arith.addi %8, %2 : i32
%10 = arith.addi %9, %3 : i32
%11 = arith.addi %10, %4 : i32
%12 = arith.addi %11, %5 : i32
%13 = arith.addi %12, %6 : i32
%14 = arith.addi %13, %7 : i32

and replaces it with a logarithmic number of rotate and addi operations:

%0 = tensor_ext.rotate %t, 4 : tensor<8xi32>
%1 = arith.addi %t, %0 : tensor<8xi32>
%2 = tensor_ext.rotate %1, 2 : tensor<8xi32>
%3 = arith.addi %1, %2 : tensor<8xi32>
%4 = tensor_ext.rotate %3, 1 : tensor<8xi32>
%5 = arith.addi %3, %4 : tensor<8xi32>

`-secret-add-debug-port`

Add debug port to secret-arithmetic ops

This pass adds debug ports to secret-arithmetic ops in the IR, namely operations wrapped by secret.generic. The debug ports are prefixed with “__heir_debug” and are invoked after each operation in the generic body. The debug ports are declarations and user should provide functions with the same name in their code.

For example, if the function is called “foo”, the debug port is called after each homomorphic operation:

// declaration of external debug function
func.func private @__heir_debug_tensor_8xi16_(tensor<8xi16>)

func.func @foo(...) {
  secret.generic {
    %0 = arith.addi ...
    // invoke external debug function
    __heir_debug_tensor_8xi16_(%0)
    %1 = arith.muli ...
    __heir_debug_tensor_8xi16_(%1)
  }
}

`-secret-capture-generic-ambient-scope`

Capture the ambient scope used in a secret.generic

For each value used in the body of a secret.generic op, which is defined in the ambient scope outside the generic, add it to the argument list of the generic.

`-secret-distribute-generic`

Distribute generic ops through their bodies.

Converts generic ops whose region contains many ops into smaller sequences of generic ops whose regions contain a single op, dropping the generic part from any resulting generic ops that have no secret.secret inputs. If the op has associated regions, and the operands are not secret, then the generic is distributed recursively through the op’s regions as well.

This pass is intended to be used as part of a front-end pipeline, where a program that operates on a secret type annotates the input to a region as secret, and then wraps the contents of the region in a single large secret.generic, then uses this pass to simplify it.

The distribute-through option allows one to specify a comma-separated list of op names (e.g., distribute-thorugh="affine.for,scf.if"), which limits the distribution to only pass through those ops. If unset, all ops are distributed through when possible.

Options

-distribute-through : comma-separated list of ops that should be distributed through

`-secret-extract-generic-body`

Extract the bodies of all generic ops into functions

This pass extracts the body of all generic ops into functions, and replaces the generic bodies with call ops. Used as a sub-operation in some passes, and extracted into its own pass for testing purposes.

This pass works best when --secret-generic-absorb-constants is run before it so that the extracted function contains any constants used in the generic op’s body.

`-secret-forget-secrets`

Convert secret types to standard types

Drop the secret<...> type from the IR, replacing it with the contained type and the corresponding cleartext computation.

secret.cast ops are replaced with freshly alloc’ed memrefs that extract individual bits of the input type or reshape them if possible.

`-secret-generic-absorb-constants`

Copy constants into a secret.generic body

For each constant value used in the body of a secret.generic op, which is defined in the ambient scope outside the generic, add it’s definition into the generic body.

`-secret-generic-absorb-dealloc`

Copy deallocs of internal memrefs into a secret.generic body

For each memref allocated and used only within a body of a secret.generic op, add it’s dealloc of the memref into its generic body.

`-secret-import-execution-result`

Annotate execution result to secret-arithmetic ops

When the execution result of each op is known by secret-add-debug-port pass, the result could be imported back to the IR.

This pass adds a new attribute secret.execution_result to the secret-arithmetic ops.

This is useful when users want to compare the precision of the result between the plaintext and the ciphertext (especially the CKKS case).

For example, if you have a trace.log that is generated by plaintext backend with --secret-add-debug-port where the result is printed out like

1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

Each line corresponds to one SSA value in the IR. You can then import the result back to the IR by using --secret-import-execution-result=file-name=trace.log.

func.func @foo(...) {
  secret.generic {
    %0 = arith.addi ... {secret.execution_result = [1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0]}
    %1 = arith.muli ... {secret.execution_result = [2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0]}
  }
}

Options

-file-name : file name of the execution result

`-secret-insert-mgmt-bfv`

Place BFV ciphertext management operations

This pass inserts relinearization operation for multiplication, and compute the multiplicative depth, or the level information.

For most cases B/FV is instantiated with no mod reduce so it is not a leveled scheme. However, for instantiating B/FV parameters it is often meaningful to know the multiplicative depth of the circuit.

Example of multiplication+addition:

func.func @func(%arg0: !secret.secret<i16>, %arg1: !secret.secret<i16>) -> !secret.secret<i16> {
  %0 = secret.generic(%arg0, %arg1 : !secret.secret<i16>, !secret.secret<i16>) {
  ^bb0(%arg2: i16, %arg3: i16):
    %1 = arith.muli %arg2, %arg3 : i16
    %2 = arith.addi %1, %arg3 : i16
    secret.yield %2 : i16
  } -> !secret.secret<i16>
  return %0 : !secret.secret<i16>
}

which get transformed to:

func.func @func(%arg0: !secret.secret<i16>, %arg1: !secret.secret<i16>) -> !secret.secret<i16> {
  %0 = secret.generic(%arg0, %arg1 : !secret.secret<i16>, !secret.secret<i16>) attrs = {arg0 = {mgmt.mgmt = #mgmt.mgmt<level = 1>}, arg1 = {mgmt.mgmt = #mgmt.mgmt<level = 1>}} {
  ^body(%input0: i16, %input1: i16):
    %1 = arith.muli %input0, %input1 {mgmt.mgmt = #mgmt.mgmt<level = 1, dimension = 3>} : i16
    %2 = mgmt.relinearize %1 {mgmt.mgmt = #mgmt.mgmt<level = 1>} : i16
    %3 = arith.addi %2, %input1 {mgmt.mgmt = #mgmt.mgmt<level = 1>} : i16
    secret.yield %3 : i16
  } -> !secret.secret<i16>
  return %0 : !secret.secret<i16>
}

`-secret-insert-mgmt-bgv`

Place BGV ciphertext management operations

This pass implements the following placement strategy:

For relinearize, after every homomorphic ciphertext-ciphertext multiplication, a mgmt.relinearize is placed after the operation. This is done to ensure that the ciphertext keeps linear.

For modulus switching, it is inserted right before a homomorphic multiplication, including ciphertext-plaintext ones. There is an option include-first controlling whether to switch modulus before the first multiplication.

User can check the FLEXIBLEAUTOEXT and FLEXIBLEAUTO mode in OpenFHE as a reference. To know more technical difference about them, user can refer to the paper “Revisiting homomorphic encryption schemes for finite firelds”.

Then, for level-mismatching binary operations like addition and subtraction, additional modulus switch is placed for the operand until it reaches the same level.

This is different from crosslevel operation handling in other implementations like using modulus switching and level drop together. The reason we only use modulus switching is for simplicity for now. Further optimization on this pass could implement such a strategy.

Before yield the final result, a modulus switching is placed if it is a result of multiplication or derived value of a multiplication.

Also, it annotates the mgmt.mgmt attribute for each operation, which includes the level and dimension information of a ciphertext. This information is subsequently used by the secret-to-bgv pass to properly lower to corresponding RNS Type.

Example of multiplication+addition:

func.func @func(%arg0: !secret.secret<i16>, %arg1: !secret.secret<i16>) -> !secret.secret<i16> {
  %0 = secret.generic(%arg0, %arg1 : !secret.secret<i16>, !secret.secret<i16>) {
  ^bb0(%arg2: i16, %arg3: i16):
    %1 = arith.muli %arg2, %arg3 : i16
    %2 = arith.addi %1, %arg3 : i16
    secret.yield %2 : i16
  } -> !secret.secret<i16>
  return %0 : !secret.secret<i16>
}

which get transformed to:

func.func @func(%arg0: !secret.secret<i16>, %arg1: !secret.secret<i16>) -> !secret.secret<i16> {
  %0 = secret.generic(%arg0, %arg1 : !secret.secret<i16>, !secret.secret<i16>) attrs = {arg0 = {mgmt.mgmt = #mgmt.mgmt<level = 1>}, arg1 = {mgmt.mgmt = #mgmt.mgmt<level = 1>}} {
  ^bb0(%arg2: i16, %arg3: i16):
    %1 = arith.muli %arg2, %arg3 {mgmt.mgmt = #mgmt.mgmt<level = 1, dimension = 3>} : i16
    %2 = mgmt.relinearize %1 {mgmt.mgmt = #mgmt.mgmt<level = 1>} : i16
    %3 = arith.addi %2, %arg3 {mgmt.mgmt = #mgmt.mgmt<level = 1>} : i16
    %4 = mgmt.modreduce %3 {mgmt.mgmt = #mgmt.mgmt<level = 0>} : i16
    secret.yield %4 : i16
  } -> !secret.secret<i16>
  return %0 : !secret.secret<i16>
}

Options

-after-mul                    : Modulus switching after each multiplication (default to false)
-before-mul-include-first-mul : Modulus switching before each multiplication, including the first multiplication (default to false)

`-secret-insert-mgmt-ckks`

Place CKKS ciphertext management operations

Check the description of secret-insert-mgmt-bgv. This pass implements similar strategy, where mgmt.modreduce stands for ckks.rescale here.

For bootstrap insertion policy, currently a greedy policy is used where when all levels are consumed then a bootstrap is inserted.

The max level available after bootstrap is controlled by the option bootstrap-waterline.

Number of bootstrap consumed level is not shown here, which is handled by further lowering. TODO(#1207): handle it here so parameter selection can depend on it. TODO(#1207): with this info we can encrypt at max level (with bootstrap consumed level).

Options

-after-mul                    : Modulus switching after each multiplication (default to false)
-before-mul-include-first-mul : Modulus switching before each multiplication, including the first multiplication (default to false)
-slot-number                  : Default number of slots use for ciphertext space.
-bootstrap-waterline          : Waterline for insert bootstrap op

`-secret-merge-adjacent-generics`

Merge two adjacent generics into a single generic

This pass merges two immediately sequential generics into a single generic. Useful as a sub-operation in some passes, and extracted into its own pass for testing purposes.

`-secret-to-bgv`

Lower secret to bgv dialect.

This pass lowers an IR with secret.generic blocks containing arithmetic operations to operations on ciphertexts with the BGV dialect.

The pass assumes that the secret.generic regions have been distributed through arithmetic operations so that only one ciphertext operation appears per generic block. It also requires that canonicalize was run so that non-secret values used are removed from the secret.generic’s block arguments.

The pass requires that all types are tensors of a uniform shape matching the dimension of the ciphertext space specified my poly-mod-degree.

Options

-poly-mod-degree : Default degree of the cyclotomic polynomial modulus to use for ciphertext space.

`-secret-to-cggi`

Lower secret to cggi dialect.

This pass lowers the secret dialect to cggi dialect.

`-secret-to-ckks`

Lower secret to ckks dialect.

This pass lowers an IR with secret.generic blocks containing arithmetic operations to operations on ciphertexts with the CKKS dialect.

The pass requires that all types are tensors of a uniform shape matching the dimension of the ciphertext space specified my poly-mod-degree.

Options

-poly-mod-degree : Default degree of the cyclotomic polynomial modulus to use for ciphertext space.

`-secret-to-mod-arith`

Lower secret to mod-arith dialect.

This pass lowers an IR with secret.generic blocks containing arithmetic operations to operations on plaintexts using the mod_arith dialect. This is primarily used in the plaintext lowering pipeline, where operations are performed directly against plaintexts.

Options

-modulus   : Modulus to use for the mod-arith dialect. If not specified, the pass will use the natural modulus for that integer type
-log-scale : Log base 2 of the scale for encoding floating points as ints.

`-secretize`

Adds secret argument attributes to entry function

Helper pass that adds a secret.secret attribute argument to each function argument. By default, the pass applies to all functions in the module. This may be overridden with the option -function=func_name to apply to a single function only.

Options

-function : function to add secret annotations to

`-select-rewrite`

Rewrites arith.select to a CMUX style expression

“This pass rewrites arith.select %c, %t, %f to %c * %t + (1 - %c) * %f. It supports all three variants of arith.select: scalar, shaped, and mixed types. In the latter case, it will broadcast/splat the scalar condition value to the required shape.”

`-shape-inference`

Infer shapes for shaped types

This pass infers the shapes of shaped types in a function, starting from function arguments annotated with a {shape.shape} attribute. Shape inference is only supported for operations that implement InferTypeOpInterface.

This is primarily intended to be used in conjunction with the Python frontend, which infers the rank, but not the length of each dimension, for tensor types.

`-straight-line-vectorize`

A vectorizer for straight line programs.

This pass ignores control flow and only vectorizes straight-line programs within a given region.

Options

-dialect : Use this to restrict the dialect whose ops should be vectorized.

`-tensor-ext-to-tensor`

Lower tensor_ext to tensor dialect.

This pass lowers the tensor_ext dialect to the tensor dialect.

This pass is intended to be used for testing purpose where the secret arithmetic IR containing tensor_ext dialect is lowered to the IR containing tensor dialect, which could be further lowered to the LLVM dialect.

`-tensor-linalg-to-affine-loops`

A port of convert-linalg-to-affine-loops for loops with tensor semantics

This pass primarily exists to support the conversion of linalg.generic operations that implement tensor_ext.assign_layout ops.

`-tosa-to-secret-arith`

Lower tosa.sigmoid to secret arith dialects.

This pass lowers the tosa.sigmoid dialect to the polynomial approximation -0.004 * x^3 + 0.197 * x + 0.5 (composed of arith, affine, and tensor operations).

This polynomial approximation of sigmoid only works over the range [-5, 5] and is taken from the paper ‘Logisitic regression over encrypted data from fully homomorphic encryption’ by Chen et al..

`-unroll-and-forward`

Loop unrolls and forwards stores to loads.

This pass processes the first function in a given module, and, starting from the first loop, iteratively does the following:

Fully unroll the loop.
Scan for load ops. For each load op with a statically-inferrable access index:
Backtrack to the original memref alloc
Find all store ops at the corresponding index (possibly transitively through renames/subviews of the underlying alloc).
Find the last store that occurs and forward it to the load.
If the original memref is an input memref, then forward through any renames to make the target load load directly from the argument memref (instead of any subviews, say)
Apply the same logic to any remaining loads not inside any for loop.

This pass requires that tensors are lowered to memref, and only supports affine loops with affine.load/store ops.

Memrefs that result from memref.get_global ops are excluded from forwarding, even if they are loaded with a static index, and are instead handled by memref-global-replace, which should be run after this pass.

`-validate-noise`

Validate the HE circuit against a given noise model

This pass validates the noise of the HE circuit against a given noise model.

The pass expects the scheme parameters to be annotated in the IR. Usually this is done by the generate-param-<scheme> passes.

For available noise models, see generate-param-<scheme> passes.

The result should be observed using –debug-only=ValidateNoise.

Example

# with commandline --debug-only=ValidateNoise
Noise Bound: 29.27 Budget: 149.73 Total: 179.00 for value: <block argument> of type 'tensor<8xi16>' at index: 0
Noise Bound: 29.27 Budget: 149.73 Total: 179.00 for value: <block argument> of type 'tensor<8xi16>' at index: 1

Options

-model                : Noise model to validate against.
-annotate-noise-bound : Annotate the noise bound to the IR.

`-wrap-generic`

Wraps regions using secret args in secret.generic bodies

This pass converts functions (func.func) with {secret.secret} annotated arguments to use !secret.secret<...> types and wraps the function body in a secret.generic region. The output type is also converted to !secret.secret<...>.

Example input:

  func.func @main(%arg0: i32 {secret.secret}) -> i32 {
    %0 = arith.constant 100 : i32
    %1 = arith.addi %0, %arg0 : i32
    return %1 : i32
  }

Output:

  func.func @main(%arg0: !secret.secret<i32>) -> !secret.secret<i32> {
    %0 = secret.generic(%arg0 : !secret.secret<i32>) {
    ^bb0(%arg1: i32):
      %1 = arith.constant 100 : i32
      %2 = arith.addi %0, %arg1 : i32
      secret.yield %2 : i32
    } -> !secret.secret<i32>
    return %0 : !secret.secret<i32>
  }

`-yosys-optimizer`

Invoke Yosys to perform circuit optimization.

This pass invokes Yosys to convert an arithmetic circuit to an optimized boolean circuit that uses the arith and comb dialects.

Note that booleanization changes the function signature: multi-bit integers are transformed to a tensor of booleans, for example, an i8 is converted to tensor<8xi1>.

The optimizer will be applied to each secret.generic op containing arithmetic ops that can be optimized.

Optional parameters:

abc-fast: Run the abc optimizer in “fast” mode, getting faster compile time at the expense of a possibly larger output circuit.
unroll-factor: Before optimizing the circuit, unroll loops by a given factor. If unset, this pass will not unroll any loops.
print-stats: Prints statistics about the optimized circuits.
mode={Boolean,LUT}: Map gates to boolean gates or lookup table gates.
use-submodules: Extract the body of a generic op into submodules. Useful for large programs with generics that can be isolated. This should not be used when distributing generics through loops to avoid index arguments in the function body.

Statistics

total circuit size : The total circuit size for all optimized circuits, after optimization is done.