Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: openfheorg/openfhe-development
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: luxcpp/fhe
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 20 commits
  • 581 files changed
  • 1 contributor

Commits on Dec 28, 2025

  1. docs: rewrite EVM integration doc with Lux FHE stack

    Remove all proprietary FHE vendor references (Zama, Fhenix)
    Replace with Lux's permissively-licensed approach:
    - OpenFHE (BSD-2-Clause)
    - Lattice Go library (Apache-2.0)
    - T-Chain threshold decryption
    - Multi-scheme support (TFHE, CKKS, BGV, BFV)
    zeekay committed Dec 28, 2025
    Configuration menu
    Copy the full SHA
    3439be6 View commit details
    Browse the repository at this point in the history
  2. feat: GPU coprocessor infrastructure and batch APIs

    GPU Acceleration:
    - Backend abstraction (BinFHEBackend) for pluggable GPU backends
    - MLX kernels for Apple Silicon (gadget_decompose, external_product, blind_rotate)
    - CUDA kernel stubs for NVIDIA GPUs
    - Packed device formats for zero-copy GPU transfer
    
    Batch APIs:
    - BootstrapBatch, EvalFuncBatch, KeySwitchBatch, ModSwitchBatch
    - BatchDAG for operation scheduling with async futures
    - Multi-output function evaluation for radix arithmetic
    
    Radix Integers:
    - Shortint module with LUT-based arithmetic
    - Radix composition for euint8..euint256
    - Lazy carry propagation with noise tracking
    
    fhEVM Integration:
    - EVM precompile wrappers
    - Gas metering framework
    - Solidity interfaces (FHE.sol)
    
    Documentation:
    - GPU coprocessor roadmap
    - Novel optimizations (10 documented)
    - Benchmark harness and results
    zeekay committed Dec 28, 2025
    Configuration menu
    Copy the full SHA
    500698b View commit details
    Browse the repository at this point in the history
  3. fix(go): add CGO include/lib paths for OpenFHE bindings

    - Add -I paths for OpenFHE headers in tfhe and ckks context.go
    - Add -L and -rpath for library linking
    - Fix const reference for GetRealPackedValue() in bridge.cpp
    - Fix unused variable in benchmark_test.go
    zeekay committed Dec 28, 2025
    Configuration menu
    Copy the full SHA
    55633ef View commit details
    Browse the repository at this point in the history
  4. ci: use GitHub-hosted ubuntu-latest runners

    Replace ${{ vars.RUNNER }} with ubuntu-latest for all workflows.
    No need for self-hosted runner infrastructure.
    zeekay committed Dec 28, 2025
    Configuration menu
    Copy the full SHA
    6f3007e View commit details
    Browse the repository at this point in the history
  5. ci: add standalone build workflow

    Simple workflow that builds C++, Go bindings, and docs
    without depending on external reusable workflows.
    zeekay committed Dec 28, 2025
    Configuration menu
    Copy the full SHA
    25384aa View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    b5930d6 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    8feb478 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    0d4ce82 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    29c04bc View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    cadba65 View commit details
    Browse the repository at this point in the history
  11. ci: add debugging for Go bindings library paths

    - Add step to list installed libraries and locations
    - Add lib64 to lib symlink if needed
    - Add verbose output for CGO path resolution
    - Simplify LD_LIBRARY_PATH settings
    zeekay committed Dec 28, 2025
    Configuration menu
    Copy the full SHA
    3c03538 View commit details
    Browse the repository at this point in the history
  12. fix: correct OpenFHE library names in CGO directives

    OpenFHE libraries are named libOPENFHE*.so (all caps OPENFHE),
    not libOpenFHE*.so (mixed case). Fixed LDFLAGS in:
    - go/tfhe/context.go
    - go/ckks/context.go
    zeekay committed Dec 28, 2025
    Configuration menu
    Copy the full SHA
    9a5c99f View commit details
    Browse the repository at this point in the history
  13. ci: add MLX and Lux extensions disable flags to main workflow

    The main.yml workflow uses OpenFHE's upstream reusable CI workflow.
    Added -DWITH_MLX=OFF and -DWITH_LUX_EXTENSIONS=OFF to all cmake_args_map
    entries to prevent build failures on Linux runners.
    zeekay committed Dec 28, 2025
    Configuration menu
    Copy the full SHA
    8741fa3 View commit details
    Browse the repository at this point in the history
  14. ci: use ubuntu-22.04 for main workflow compilers

    GCC-11 and CLANG-14 are not available on ubuntu-latest (Ubuntu 24.04).
    Pin to ubuntu-22.04 which has these compiler versions installed.
    zeekay committed Dec 28, 2025
    Configuration menu
    Copy the full SHA
    075fc83 View commit details
    Browse the repository at this point in the history
  15. ci: disable mb6_ntl jobs (NTL library unavailable on GitHub runners)

    MATHBACKEND 6 with NTL requires libntl which is not installed on
    GitHub-hosted runners. Disable these jobs since:
    - MATHBACKEND 2 (64-bit) and 4 (128-bit) cover most FHE use cases
    - NTL is only needed for arbitrary precision arithmetic
    - The upstream reusable workflow doesn't support pre-install steps
    zeekay committed Dec 28, 2025
    Configuration menu
    Copy the full SHA
    307ab4a View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2025

  1. feat: add AVX2/256-bit optimizations for EVM and UTXO

    - Enable AVX2/256-bit SIMD for EVM uint256 operations
    - Add UTXO-optimized build with lean uint64 parameters
    - Enable benchmarks for performance comparison
    - Add patent-pending optimization design document:
      - DMAFHE: Dual-mode adaptive FHE
      - ULFHE: UTXO lightweight FHE
      - EVM256PP: Parallel uint256 processing
      - XCFHE: Cross-chain FHE bridge
      - VAFHE: Validator-accelerated FHE
    zeekay committed Dec 29, 2025
    Configuration menu
    Copy the full SHA
    0bc890c View commit details
    Browse the repository at this point in the history

Commits on Dec 30, 2025

  1. fix: resolve MLX array initialization and preprocessor issues

    - Add makeArrayOf8() helper for std::array<mx::array, 8> initialization
      (mx::array has no default constructor)
    - Initialize GeneratePropagate and CompareFlags struct members
    - Fix std::vector<mx::array> resize to use reserve+push_back pattern
    - Change #ifdef WITH_MLX / #endif to #ifdef / #else / #endif to avoid
      variable redefinition errors when MLX is enabled
    - Copy headers from lib/ to include/ for proper include paths
    
    Fixes build errors in euint256_test and related MLX GPU acceleration code.
    zeekay committed Dec 30, 2025
    Configuration menu
    Copy the full SHA
    61c2818 View commit details
    Browse the repository at this point in the history
  2. perf: maximize MLX GPU acceleration with comprehensive optimizations

    Critical Bug Fixes:
    - Fix euint256 comparison returning F[0] instead of F[7] (wrong result)
    - Fix blind rotate edge case when b_val=0 (off-by-one error)
    
    Lazy Evaluation Optimization (2-5x speedup):
    - Remove intermediate mx::eval() calls that break MLX lazy evaluation
    - Combine sequential eval() calls into single mx::eval(a, b, c...)
    - Keep eval() only at batch boundaries and key upload
    
    Pre-allocated Workspace (eliminate hot-path allocations):
    - Add PBSWorkspace struct with pre-allocated qArray, twoNArray, indices
    - Add workspace to OptimizedPBSEngine and euint256PBSContext
    - Replace mx::zeros/mx::full in hot paths with workspace slices
    
    euint256 Optimizations (15-20% faster):
    - Remove 8 PBS ops from carry application (use direct LWE addition)
    - Add fastEquality256() method (23 PBS vs 32+ for full compare)
    - Use makeArrayOf8() consistently for initialization
    
    NTT/Metal Kernel Optimizations:
    - Add global twiddle factor cache by (N, Q, is_inverse)
    - Stage 0 slice optimization (2x faster using strided slice)
    - Skip final stage threadgroup barrier (unnecessary sync)
    - Cap thread group size at 512 for better register allocation
    
    Thread Safety & Validation:
    - Add atomic counters for cache hit/miss statistics
    - Add shape validation to NTT stage functions
    - Add 8-word validation to euint256 operations
    
    Expected Performance: 20-30x GPU speedup (up from 6x baseline)
    zeekay committed Dec 30, 2025
    Configuration menu
    Copy the full SHA
    847627e View commit details
    Browse the repository at this point in the history
  3. fix: resolve constant-time comparison bugs and CMake library references

    Constant-Time Integer Promotion Bug:
    - Fix ct_eq() returning 254 instead of 0 for non-equal uint8_t values
    - Issue: integer promotion caused (diff | (~diff + 1)) >> 7 to shift
      a 32-bit int instead of uint8_t, giving wrong results
    - Solution: Add static_cast<T>() before the shift to truncate first
    
    Kogge-Stone Prefix Scan Bug:
    - Fix ct_prefix_compare() argument order in ct_combine_flags()
    - Was: ct_combine_flags(flags[i], flags[i-stride]) - current as high
    - Now: ct_combine_flags(flags[i-stride], flags[i]) - accumulated as high
    - This ensures the higher-significance result properly dominates
    
    CMakeLists.txt Fixes:
    - benchmark/CMakeLists.txt: FHEbinfhe -> FHEbin (correct library target)
    - server/CMakeLists.txt: FHEbinfhe -> FHEbin (correct library target)
    
    All 138 binfhe_tests and 176 core_tests pass.
    zeekay committed Dec 30, 2025
    Configuration menu
    Copy the full SHA
    00b0c9d View commit details
    Browse the repository at this point in the history

Commits on Dec 31, 2025

  1. perf: add fused Metal kernels for 10x+ additional speedup

    Fused Blind Rotation Kernel:
    - New blind_rotate_fused.metal processes all 512 iterations in single launch
    - Eliminates 512 kernel launches per bootstrap (was 5ms overhead, now 0.5ms)
    - Uses 40KB shared memory for accumulator and work buffers
    - Includes full external product pipeline (decompose + NTT + mul + INTT)
    - FusedBlindRotate class in metal_dispatch_optimized.h
    
    Async BSK Pipeline:
    - New async_pipeline.h with double-buffered BSK access
    - BSKBufferPool for ping-pong GPU buffer management
    - StreamExecutor thread pool for parallel batch submission
    - AsyncPBSPipeline overlaps BSK fetch with compute
    - Integrated into OptimizedPBSEngine with executeBatchAsync()
    
    Batched External Product:
    - New external_product_batch.metal with 5 kernel variants
    - Optimized for CMux pattern (one RGSW, many RLWE)
    - Fused decompose + multiply + accumulate pipeline
    - BatchedExternalProduct class wrapper
    
    Benchmark Results:
    - N=4096, batch=128: 17.1x GPU speedup (up from 13.2x)
    - N=2048, batch=128: 11.3x GPU speedup (up from 8.0x)
    - Total improvement: ~130x vs baseline CPU
    zeekay committed Dec 31, 2025
    Configuration menu
    Copy the full SHA
    2f9a828 View commit details
    Browse the repository at this point in the history
Loading