dtFFT is a high-performance library for parallel data transpositions and optional Fast Fourier Transforms (FFTs) in multidimensional computing environments. It minimizes memory overhead in distributed systems by using custom MPI datatypes for CPU clusters and runtime-compiled CUDA kernels for GPU workflows. Supports integration with FFT libraries like FFTW3, MKL, cuFFT, and VkFFT, or transpose-only mode.
Key benefits: Zero-copy transpositions, GPU acceleration, and seamless MPI/CUDA integration for scientific computing.
dtFFT aims to optimize following cycles of transformations (forward and backward):
for 2D case, and
for 3D case. Where
- Transform Types: R2C, C2C, and R2R transforms
- Precision: Single and double precision support
- Interfaces: Fortran, C, and C++ APIs
- Decompositions: 2D and 3D transposition plans with Bricks, Pencils and Slab decompositions support
- Transpositions: Custom MPI datatypes enhanced with standard host-based transpositions
- GPU Support: CUDA acceleration with runtime kernel compilation
- FFT Libraries: built-in support:
- CUDA Backends: MPI, NCCL, cuFFTMp for data exchange
- Memory is assumed to be contiguous. Ghost boundaries are not allowed.
- OpenMP for multicore parallelism is not supported.
- Maximum number of elements per process/GPU cannot exceed
$2^{31} - 1$ - max value of int32
- Fortran/C/C++ compilers (GCC, Intel and NVHPC-SDK are tested)
- MPI (OpenMPI, MPICH and Intel MPI are tested)
- Clone the repository:
git clone https://github.com/ShatrovOA/dtFFT.git - Configure with CMake:
mkdir build && cd build && cmake .. - Build:
make - Install:
make install
For detailed instructions, see the Building the Library.
use dtfft
use mpi
use iso_fortran_env
type(dtfft_plan_c2c_t) :: plan
complex(real64), pointer :: real_buffer(:,:)
complex(real64), pointer :: fourier_buffer(:,:)
integer(int32) :: in_counts(2), out_counts(2)
integer(int64) :: alloc_size ! Can be bigger then `product(in_counts)` and `product(out_counts)`
! Create plan
call plan%create([100, 100], comm=MPI_COMM_WORLD)
! Get memory requirements
call plan%get_local_sizes(in_counts=in_counts, out_counts=out_counts, alloc_size=alloc_size)
! Allocate memory
call plan%mem_alloc(alloc_size, real_buffer, in_counts)
call plan%mem_alloc(alloc_size, fourier_buffer, out_counts)
! Execute plan
call plan%execute(real_buffer, fourier_buffer, DTFFT_EXECUTE_FORWARD)
! Now data is aligned in Y direction
! Execute backwards
call plan%execute(fourier_buffer, real_buffer, DTFFT_EXECUTE_BACKWARD)
! Free all memory
call plan%mem_free(real_buffer)
call plan%mem_free(fourier_buffer)
! Destroy plan
call plan%destroy()More examples can be found in the tests folder of the repository.
The documentation consists of two parts:
User Documentation: Available at dtFFT User Documentation, this section provides instructions on how to compile and use the library, as well as the public API documentation for C, C++, and Fortran. It is generated using Sphinx and hosted on Read the Docs.
Internal Documentation: Available at dtFFT Internal Documentation, this section describes the internal structure of the library, including call graphs and detailed implementation insights. It is generated using FORD.
The following is an ambitious list of features to implement. The items are in no particular order.
- nvSHMEM-based backends
- HIP platform
- zfp compression
- Ghost boundaries and support for halo exchange
- long double/quad precision
We welcome contributions! Report issues or suggest improvements via the issue tracker. For code contributions, please follow standard GitHub workflows (fork, branch, PR).
Licensed under GPL v3. See LICENSE for details.
