dtFFT - DataTyped Fast Fourier Transform

dtFFT is a high-performance library for parallel data transpositions and optional Fast Fourier Transforms (FFTs) in multidimensional computing environments. It minimizes memory overhead in distributed systems by using custom MPI datatypes for CPU clusters and runtime-compiled CUDA kernels for GPU workflows. Supports integration with FFT libraries like FFTW3, MKL, cuFFT, and VkFFT, or transpose-only mode.

Key benefits: Zero-copy transpositions, GPU acceleration, and seamless MPI/CUDA integration for scientific computing.

dtFFT aims to optimize following cycles of transformations (forward and backward):

$$\dfrac{X}{P_0} \times \dfrac{Y}{P_1} \to X \times \dfrac{Y}{P_0P_1} \to Y \times \dfrac{X}{P_0P_1} \to \dfrac{Y}{P_1} \times \dfrac{X}{P_0}$$

for 2D case, and

$$\dfrac{X}{P_0} \times \dfrac{Y}{P_1} \times \dfrac{Z}{P_2} \to X \times \dfrac{Y}{Q_1} \times \dfrac{Z}{Q_2} \to Y \times \dfrac{Z}{Q_2} \times \dfrac{X}{Q_1} \to Z \times \dfrac{X}{Q_1} \times \dfrac{Y}{Q_2} \to \dfrac{Z}{P_2} \times \dfrac{X}{Q_1} \times \dfrac{Y}{Q_2'}$$

for 3D case. Where $X, Y, Z$ are the spatial dimensions of the data, $X$ being the fastest varying index and $P_0, P_1, P_2, Q_1, Q_2, Q_2'$ are the number of processes in the appropriate direction.

Features

Transform Types: R2C, C2C, and R2R transforms
Precision: Single and double precision support
Interfaces: Fortran, C, and C++ APIs
Decompositions: 2D and 3D transposition plans with Bricks, Pencils and Slab decompositions support
Transpositions: Custom MPI datatypes enhanced with standard host-based transpositions
GPU Support: CUDA acceleration with runtime kernel compilation
FFT Libraries: built-in support:
- FFTW3
- MKL DFTI
- cuFFT
- VkFFT
CUDA Backends: MPI, NCCL, cuFFTMp for data exchange

Limitations

Memory is assumed to be contiguous. Ghost boundaries are not allowed.
OpenMP for multicore parallelism is not supported.
Maximum number of elements per process/GPU cannot exceed $2^{31} - 1$ - max value of int32

Requirements

Fortran/C/C++ compilers (GCC, Intel and NVHPC-SDK are tested)
MPI (OpenMPI, MPICH and Intel MPI are tested)

Installation

Clone the repository: git clone https://github.com/ShatrovOA/dtFFT.git
Configure with CMake: mkdir build && cd build && cmake ..
Build: make
Install: make install

For detailed instructions, see the Building the Library.

Quick Start

use dtfft
use mpi
use iso_fortran_env
type(dtfft_plan_c2c_t) :: plan
complex(real64), pointer :: real_buffer(:,:)
complex(real64), pointer :: fourier_buffer(:,:)
integer(int32) :: in_counts(2), out_counts(2)
integer(int64) :: alloc_size ! Can be bigger then `product(in_counts)` and `product(out_counts)`

! Create plan
call plan%create([100, 100], comm=MPI_COMM_WORLD)
! Get memory requirements
call plan%get_local_sizes(in_counts=in_counts, out_counts=out_counts, alloc_size=alloc_size)
! Allocate memory
call plan%mem_alloc(alloc_size, real_buffer, in_counts)
call plan%mem_alloc(alloc_size, fourier_buffer, out_counts)

! Execute plan
call plan%execute(real_buffer, fourier_buffer, DTFFT_EXECUTE_FORWARD)
! Now data is aligned in Y direction
! Execute backwards
call plan%execute(fourier_buffer, real_buffer, DTFFT_EXECUTE_BACKWARD)
! Free all memory
call plan%mem_free(real_buffer)
call plan%mem_free(fourier_buffer)
! Destroy plan
call plan%destroy()

More examples can be found in the tests folder of the repository.

Documentation

The documentation consists of two parts:

User Documentation: Available at dtFFT User Documentation, this section provides instructions on how to compile and use the library, as well as the public API documentation for C, C++, and Fortran. It is generated using Sphinx and hosted on Read the Docs.

Internal Documentation: Available at dtFFT Internal Documentation, this section describes the internal structure of the library, including call graphs and detailed implementation insights. It is generated using FORD.

API Reference

Roadmap

The following is an ambitious list of features to implement. The items are in no particular order.

nvSHMEM-based backends
HIP platform
zfp compression
Ghost boundaries and support for halo exchange
long double/quad precision

Contributing

We welcome contributions! Report issues or suggest improvements via the issue tracker. For code contributions, please follow standard GitHub workflows (fork, branch, PR).

Licensing

Licensed under GPL v3. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
benchmark/cuda		benchmark/cuda
cmake		cmake
docs		docs
include		include
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dtFFT - DataTyped Fast Fourier Transform

Features

Limitations

Requirements

Installation

Quick Start

Documentation

API Reference

Roadmap

Contributing

Licensing

About

Uh oh!

Releases

Packages

Languages

License

forKernels/dtFFT

Folders and files

Latest commit

History

Repository files navigation

dtFFT - DataTyped Fast Fourier Transform

Features

Limitations

Requirements

Installation

Quick Start

Documentation

API Reference

Roadmap

Contributing

Licensing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages