forked from rmn64k/cuda3dfd
-
Notifications
You must be signed in to change notification settings - Fork 0
tkol999/cuda3dfd
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Contents
1. Package overview
2. Using the code
2.a Compile time parameters
2.b Needed variables
2.c Needed files
2.d Building
3. Programming guide
3.a Floating point precision
3.b Memory
3.c CUDA kernels
---------------------------------------------------------------------------
1. Package overview
This package contains:
- different CUDA C++ implementations of 6th order finite difference
operators,
- a benchmark program (bench) for the operators,
- a solver program (solver) for Burgers' equation,
- a small example program (sample) computing the gradient.
- basic CUDA Fortran code and test programs for the gradient, divergence
and curl operators.
2. Using the code
This section lists the requirements for using the code in a program.
2.a Compile time parameters
- NX, NY, NZ -- The number of grid points in each dimension.
- NX_TILE, NY_TILE -- The size of the 2D thread blocks used by the
kernels.
Note that NX_TILE should divide NX evenly and the same should hold for
NY_TILE and NY. Setting NX_TILE and NY_TILE to 16 is generally good for
smaller problems and 32 for larger.
2.b Needed variables
A program that uses this code needs to specify a few variables.
- x_0, y_0, z_0 -- The start grid point.
- dx, dy, dz -- The spatial discretization.
2.c Needed files
At least one needs to include common.h and device.h. This will also
include field3.h, which contains memory management code.
The device.o object file contains all GPU code and needs to be linked
with the program.
2.d Building
The easiest way to build a program using this code is to add it to the
included Makefile. That way you can simply use the configure script and
make utility to compile the program. The included programs are of course
built this way.
The compile time parameters and some other options can be set with the
configure script. Run 'configure help' to see the various options.
Below is an example setting up the build for compute version 3.7 (suitable
for the K80) with NX=NY=NZ=256 and 32x32 sized thread blocks.
-------------------------------------------------------------------------
> ./configure dir=build N=256 cm=37 NX_TILE=32 NY_TILE=32
> cd build
> make
-------------------------------------------------------------------------
3. Programming guide
This is short guide on how to use the most important systems in the code.
The sample and solver programs are also good places to look at to see how
the code is used.
3.a Floating point precision
The code uses the user defined type 'real' which is defined as 'double'
by default. It can be set to 'float' by the configure script.
3.a Memory
The memory for the 3D variable fields used in the code is arranged in
normal linear arrays. The field3.h file contains functions for indexing
into the array and for computing different strides and memory sizes. It
also defines thin wrapper classes for allocating and managing the memory
for the 3D vector fields, both on the GPU and host.
The vector fields store (NX+6)*(NY+6)*(NZ+6) values per variable. The 6
extra vales in each dimension are for surrounding ghost zones needed by
the finite difference stencil. The fields actually use more memory since
the memory is padded for better performance in the finite difference
kernels. However, the programmer does not need to account for the exact
memory layout since there are functions for computing the correct index.
Some memory functions:
- vfidx(xi, yi, zi, vi) --
Compute linear array index for variable vi at the grid point.
The spatial indices start at 0 and goes to N[X|Y|Z]+5.
The ghost values are at [xi|yi|zi]=0..2,N[X|Y|Z]+3..N[X|Y|Z]+5.
The variable indices also start at 0.
- vfystride() -- Linear index delta to go from yi to yi + 1
- vfzstride() -- Linear index delta to go from zi to zi + 1
- vfvstride() -- Linear index delta to go from vi to vi + 1
- vfmemsize(nv) -- Memory size of vector field of nv variables
The vf3dgpu class:
- vf3dgpu(nv) -- Construct vector field of nv variables
- mem() -- Get the GPU memory address of the array
- free() -- Free the GPU memory
- varcount() -- Get the number of variables in this field
- subfield(vi, nv) -- Get a field containing nv variables starting at vi
- copy_to_host(dst) -- Copy from GPU to host memory
- copy_from_host(src) -- Copy from host to GPU memory
The vf3dhost class is similar to the vf3dgpu class but of course uses
host memory instead of GPU memory.
Example:
-------------------------------------------------------------------------
// Host memory vector field of 4 variables.
vf3dhost h(4);
// Get array pointer.
real *m = h.mem();
// Middle grid point for 2nd var.
size_t idx = vfidx(NX/2, NY/2, NZ/2, 1);
// Set the grid point to 5.0;
m[idx] = 5.0;
// GPU memory vector field of 2 variables.
vf3dgpu g(2);
// Copy the second and third variables of the host field to the GPU field.
g.copy_from_host(h.subfield(1,2));
// Free memory
h.free();
g.free();
-------------------------------------------------------------------------
3.c CUDA kernels
See device.h for the available CUDA kernels. Almost all operators take
vf3dgpu objects as arguments. It is up to the programmer to make sure
the vector fields are of correct sizes. For instance the gradient takes
one vector field of one variable and one vector field of three variables
as arguments.
In addition to the mathematical and finite difference operators there are
also kernels for memory initialization, boundary conditions and clearing
GPU memory. Look at the sample, solver and bench programs for example use
of different kernels.
The finite difference kernels use the NX_TILE and NY_TILE parameters for
the size of their 2D thread blocks.
About
6th order 3D finite differences in CUDA.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Cuda 90.3%
- C++ 7.2%
- C 2.0%
- Shell 0.5%