-
Notifications
You must be signed in to change notification settings - Fork 267
Comparing changes
Open a pull request
base repository: openfheorg/openfhe-development
base: main
head repository: luxcpp/fhe
compare: main
- 20 commits
- 581 files changed
- 1 contributor
Commits on Dec 28, 2025
-
docs: rewrite EVM integration doc with Lux FHE stack
Remove all proprietary FHE vendor references (Zama, Fhenix) Replace with Lux's permissively-licensed approach: - OpenFHE (BSD-2-Clause) - Lattice Go library (Apache-2.0) - T-Chain threshold decryption - Multi-scheme support (TFHE, CKKS, BGV, BFV)
Configuration menu - View commit details
-
Copy full SHA for 3439be6 - Browse repository at this point
Copy the full SHA 3439be6View commit details -
feat: GPU coprocessor infrastructure and batch APIs
GPU Acceleration: - Backend abstraction (BinFHEBackend) for pluggable GPU backends - MLX kernels for Apple Silicon (gadget_decompose, external_product, blind_rotate) - CUDA kernel stubs for NVIDIA GPUs - Packed device formats for zero-copy GPU transfer Batch APIs: - BootstrapBatch, EvalFuncBatch, KeySwitchBatch, ModSwitchBatch - BatchDAG for operation scheduling with async futures - Multi-output function evaluation for radix arithmetic Radix Integers: - Shortint module with LUT-based arithmetic - Radix composition for euint8..euint256 - Lazy carry propagation with noise tracking fhEVM Integration: - EVM precompile wrappers - Gas metering framework - Solidity interfaces (FHE.sol) Documentation: - GPU coprocessor roadmap - Novel optimizations (10 documented) - Benchmark harness and results
Configuration menu - View commit details
-
Copy full SHA for 500698b - Browse repository at this point
Copy the full SHA 500698bView commit details -
fix(go): add CGO include/lib paths for OpenFHE bindings
- Add -I paths for OpenFHE headers in tfhe and ckks context.go - Add -L and -rpath for library linking - Fix const reference for GetRealPackedValue() in bridge.cpp - Fix unused variable in benchmark_test.go
Configuration menu - View commit details
-
Copy full SHA for 55633ef - Browse repository at this point
Copy the full SHA 55633efView commit details -
ci: use GitHub-hosted ubuntu-latest runners
Replace ${{ vars.RUNNER }} with ubuntu-latest for all workflows. No need for self-hosted runner infrastructure.Configuration menu - View commit details
-
Copy full SHA for 6f3007e - Browse repository at this point
Copy the full SHA 6f3007eView commit details -
ci: add standalone build workflow
Simple workflow that builds C++, Go bindings, and docs without depending on external reusable workflows.
Configuration menu - View commit details
-
Copy full SHA for 25384aa - Browse repository at this point
Copy the full SHA 25384aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for b5930d6 - Browse repository at this point
Copy the full SHA b5930d6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8feb478 - Browse repository at this point
Copy the full SHA 8feb478View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d4ce82 - Browse repository at this point
Copy the full SHA 0d4ce82View commit details -
Configuration menu - View commit details
-
Copy full SHA for 29c04bc - Browse repository at this point
Copy the full SHA 29c04bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for cadba65 - Browse repository at this point
Copy the full SHA cadba65View commit details -
ci: add debugging for Go bindings library paths
- Add step to list installed libraries and locations - Add lib64 to lib symlink if needed - Add verbose output for CGO path resolution - Simplify LD_LIBRARY_PATH settings
Configuration menu - View commit details
-
Copy full SHA for 3c03538 - Browse repository at this point
Copy the full SHA 3c03538View commit details -
fix: correct OpenFHE library names in CGO directives
OpenFHE libraries are named libOPENFHE*.so (all caps OPENFHE), not libOpenFHE*.so (mixed case). Fixed LDFLAGS in: - go/tfhe/context.go - go/ckks/context.go
Configuration menu - View commit details
-
Copy full SHA for 9a5c99f - Browse repository at this point
Copy the full SHA 9a5c99fView commit details -
ci: add MLX and Lux extensions disable flags to main workflow
The main.yml workflow uses OpenFHE's upstream reusable CI workflow. Added -DWITH_MLX=OFF and -DWITH_LUX_EXTENSIONS=OFF to all cmake_args_map entries to prevent build failures on Linux runners.
Configuration menu - View commit details
-
Copy full SHA for 8741fa3 - Browse repository at this point
Copy the full SHA 8741fa3View commit details -
ci: use ubuntu-22.04 for main workflow compilers
GCC-11 and CLANG-14 are not available on ubuntu-latest (Ubuntu 24.04). Pin to ubuntu-22.04 which has these compiler versions installed.
Configuration menu - View commit details
-
Copy full SHA for 075fc83 - Browse repository at this point
Copy the full SHA 075fc83View commit details -
ci: disable mb6_ntl jobs (NTL library unavailable on GitHub runners)
MATHBACKEND 6 with NTL requires libntl which is not installed on GitHub-hosted runners. Disable these jobs since: - MATHBACKEND 2 (64-bit) and 4 (128-bit) cover most FHE use cases - NTL is only needed for arbitrary precision arithmetic - The upstream reusable workflow doesn't support pre-install steps
Configuration menu - View commit details
-
Copy full SHA for 307ab4a - Browse repository at this point
Copy the full SHA 307ab4aView commit details
Commits on Dec 29, 2025
-
feat: add AVX2/256-bit optimizations for EVM and UTXO
- Enable AVX2/256-bit SIMD for EVM uint256 operations - Add UTXO-optimized build with lean uint64 parameters - Enable benchmarks for performance comparison - Add patent-pending optimization design document: - DMAFHE: Dual-mode adaptive FHE - ULFHE: UTXO lightweight FHE - EVM256PP: Parallel uint256 processing - XCFHE: Cross-chain FHE bridge - VAFHE: Validator-accelerated FHE
Configuration menu - View commit details
-
Copy full SHA for 0bc890c - Browse repository at this point
Copy the full SHA 0bc890cView commit details
Commits on Dec 30, 2025
-
fix: resolve MLX array initialization and preprocessor issues
- Add makeArrayOf8() helper for std::array<mx::array, 8> initialization (mx::array has no default constructor) - Initialize GeneratePropagate and CompareFlags struct members - Fix std::vector<mx::array> resize to use reserve+push_back pattern - Change #ifdef WITH_MLX / #endif to #ifdef / #else / #endif to avoid variable redefinition errors when MLX is enabled - Copy headers from lib/ to include/ for proper include paths Fixes build errors in euint256_test and related MLX GPU acceleration code.
Configuration menu - View commit details
-
Copy full SHA for 61c2818 - Browse repository at this point
Copy the full SHA 61c2818View commit details -
perf: maximize MLX GPU acceleration with comprehensive optimizations
Critical Bug Fixes: - Fix euint256 comparison returning F[0] instead of F[7] (wrong result) - Fix blind rotate edge case when b_val=0 (off-by-one error) Lazy Evaluation Optimization (2-5x speedup): - Remove intermediate mx::eval() calls that break MLX lazy evaluation - Combine sequential eval() calls into single mx::eval(a, b, c...) - Keep eval() only at batch boundaries and key upload Pre-allocated Workspace (eliminate hot-path allocations): - Add PBSWorkspace struct with pre-allocated qArray, twoNArray, indices - Add workspace to OptimizedPBSEngine and euint256PBSContext - Replace mx::zeros/mx::full in hot paths with workspace slices euint256 Optimizations (15-20% faster): - Remove 8 PBS ops from carry application (use direct LWE addition) - Add fastEquality256() method (23 PBS vs 32+ for full compare) - Use makeArrayOf8() consistently for initialization NTT/Metal Kernel Optimizations: - Add global twiddle factor cache by (N, Q, is_inverse) - Stage 0 slice optimization (2x faster using strided slice) - Skip final stage threadgroup barrier (unnecessary sync) - Cap thread group size at 512 for better register allocation Thread Safety & Validation: - Add atomic counters for cache hit/miss statistics - Add shape validation to NTT stage functions - Add 8-word validation to euint256 operations Expected Performance: 20-30x GPU speedup (up from 6x baseline)
Configuration menu - View commit details
-
Copy full SHA for 847627e - Browse repository at this point
Copy the full SHA 847627eView commit details -
fix: resolve constant-time comparison bugs and CMake library references
Constant-Time Integer Promotion Bug: - Fix ct_eq() returning 254 instead of 0 for non-equal uint8_t values - Issue: integer promotion caused (diff | (~diff + 1)) >> 7 to shift a 32-bit int instead of uint8_t, giving wrong results - Solution: Add static_cast<T>() before the shift to truncate first Kogge-Stone Prefix Scan Bug: - Fix ct_prefix_compare() argument order in ct_combine_flags() - Was: ct_combine_flags(flags[i], flags[i-stride]) - current as high - Now: ct_combine_flags(flags[i-stride], flags[i]) - accumulated as high - This ensures the higher-significance result properly dominates CMakeLists.txt Fixes: - benchmark/CMakeLists.txt: FHEbinfhe -> FHEbin (correct library target) - server/CMakeLists.txt: FHEbinfhe -> FHEbin (correct library target) All 138 binfhe_tests and 176 core_tests pass.
Configuration menu - View commit details
-
Copy full SHA for 00b0c9d - Browse repository at this point
Copy the full SHA 00b0c9dView commit details
Commits on Dec 31, 2025
-
perf: add fused Metal kernels for 10x+ additional speedup
Fused Blind Rotation Kernel: - New blind_rotate_fused.metal processes all 512 iterations in single launch - Eliminates 512 kernel launches per bootstrap (was 5ms overhead, now 0.5ms) - Uses 40KB shared memory for accumulator and work buffers - Includes full external product pipeline (decompose + NTT + mul + INTT) - FusedBlindRotate class in metal_dispatch_optimized.h Async BSK Pipeline: - New async_pipeline.h with double-buffered BSK access - BSKBufferPool for ping-pong GPU buffer management - StreamExecutor thread pool for parallel batch submission - AsyncPBSPipeline overlaps BSK fetch with compute - Integrated into OptimizedPBSEngine with executeBatchAsync() Batched External Product: - New external_product_batch.metal with 5 kernel variants - Optimized for CMux pattern (one RGSW, many RLWE) - Fused decompose + multiply + accumulate pipeline - BatchedExternalProduct class wrapper Benchmark Results: - N=4096, batch=128: 17.1x GPU speedup (up from 13.2x) - N=2048, batch=128: 11.3x GPU speedup (up from 8.0x) - Total improvement: ~130x vs baseline CPU
Configuration menu - View commit details
-
Copy full SHA for 2f9a828 - Browse repository at this point
Copy the full SHA 2f9a828View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff main...main