Core FFT API

This are the core FFT classes.

Base FFT class

class pyvkfft.base.VkFFTApp(shape, dtype: type, ndim=None, inplace=True, norm=1, r2c=False, dct=False, dst=False, axes=None, strides=None, r2c_odd=False, **kwargs)

VkFFT application interface implementing a FFT plan, base implementation handling functions and paremeters common to the CUDA and OpenCL backends.

get_algo_str(vkfft_axes=False)

Return a string indicating the type of algorithm used for each axis, either [r]adix, [B]luestein or [R]ader, or '-' if the axis is skipped.

get_fft_scale()

Return the scale factor by which an array must be multiplied to keep its L2 norm after a forward FT

get_ifft_scale()

Return the scale factor by which an array must be multiplied to keep its L2 norm after a backward FT

get_nb_upload(axis=None, vkfft_axes=False)

Number of uploads for the transform along given axes - ideally 1 so that each transform corresponds to 1 read and 1 write of the array.

Parameters:
  • axis -- the index of one axis. If None, a list of values for all axes is returned

  • collapsed_axes -- True to use the index relative to collapsed axes, False (the default) otherwise. See _get_vkfft_axes() for details.

get_shape_str(vkfft_axes=False)

Get a string with the shape of the array, including the +1 or +2 for inplace r2c transforms.

Parameters:

vkfft_axes -- True to use the index relative to VkFFT axes, False (the default) otherwise. See _get_vkfft_axes() for details.

get_tmp_buffer_nbytes()

Return the size (in bytes) of the temporary buffer allocated by VkFFT for the transform, if any.

get_tmp_buffer_str()

Get a string with the size of the temporary buffer allocated by VkFFT, e.g. '0', '123kB', '1.2GB', etc. Uses 6 chars.

is_bluestein_transform(axis=None, vkfft_axes=False)

Return True if the transform used along a given axis uses Bluestein's algorithm, or False

Parameters:
  • axis -- the index of one axis. If None, a list of values for all the axis is returned

  • vkfft_axes -- True to use the index relative to VkFFT axes, False (the default) otherwise. See _get_vkfft_axes() for details.

is_rader_transform(axis=None, vkfft_axes=False)

Return True if the transform used along a given axis uses Rader's algorithm, or False

Parameters:
  • axis -- the index of one axis. If None, a list of values for all the axis is returned

  • vkfft_axes -- True to use the index relative to VkFFT axes, False (the default) otherwise. See _get_vkfft_axes() for details.

is_radix_transform(axis=None, vkfft_axes=False)

Return True if the transform used along a given axis uses a radix algorithm, or False

Parameters:
  • axis -- the index of one axis. If None, a list of values for all axes is returned

  • vkfft_axes -- True to use the index relative to VkFFT axes, False (the default) otherwise. See _get_vkfft_axes() for details.

class pyvkfft.base.VkFFTResult(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

VkFFT error codes from vkFFT.h

pyvkfft.base.calc_transform_axes(shape, axes=None, ndim=None, strides=None)

Compute the final shape of the array to be passed to VkFFT, and the axes for which the transform should be skipped. By collapsing non-transformed consecutive axes and using batch transforms, it is possible to support larger dimensions (the limit is set at compilation time).

Parameters:
  • shape -- the initial shape of the data array. Note that this shape should be in the usual numpy order, i.e. the fastest axis is listed last. e.g. (nz, ny, nx)

  • axes -- the axes to be transformed. if None, all axes are transformed, or up to ndim.

  • ndim -- the number of dimensions for the transform. If None, the number of axes is used

  • strides -- the array strides. If None, a C-order is assumed with the fastest axes along the last dimensions (numpy default)

Returns:

(shape, n_batch, skip_axis, ndim, axes0) with the shape after collapsing consecutive non-transformed axes (padded with ones if necessary, with the order adequate for VkFFT i.e. (nx, ny, nz,...), the batch size (e.g. 5 if shape=(5,16,16) and ndim=2), the list of booleans indicating which axes should be skipped, and the number of transform axes. Finally, axes0 is returned as a list of transformed axes, before any axis collapsing.

pyvkfft.base.check_vkfft_result(res, shape=None, dtype=None, ndim=None, inplace=None, norm=None, r2c=None, dct=None, dst=None, axes=None, backend=None, strides=None, vkfft_shape=None, vkfft_skip=None, vkfft_nbatch=None)

Check VkFFTResult code.

Parameters:
  • res -- the result code from launching a transform.

  • shape -- shape of the array

  • dtype -- data type of the array

  • ndim -- number of transform dimensions

  • inplace -- True or False

  • norm -- 0 or1 or "ortho"

  • r2c -- True or False

  • dct -- False, 1, 2, 3 or 4

  • dst -- False, 1, 2, 3 or 4

  • axes -- transform axes

  • backend -- the backend

  • strides -- the array strides

  • vkfft_shape -- the shape passed to VkFFT

  • vkfft_skip -- the skipped axis list passed to VkFFT

  • vkfft_nbatch -- VkFFT batch parameter

Raises:

RuntimeError -- if res != 0

pyvkfft.base.primes(n)

Returns the prime decomposition of n as a list. This only remains as a useful function, but VkFFT allows any prime decomposition, even if performance can be better for prime numbers <=13.

pyvkfft.base.radix_gen(nmax, radix, even=False, exclude_one=True, inverted=False, nmin=None, max_pow=None, r2r=False)

Generate an array of integers which are only multiple of powers of base integers, e.g. 2**N1 * 3**N2 * 5**N3 etc...

Parameters:
  • nmax -- the maximum integer to return (included)

  • radix -- the list/tuple of base integers - which don't need to be primes

  • even -- if True, only return even numbers

  • exclude_one -- if True (the default), exclude 1

  • inverted -- if True, the returned array will only include integers which are NOT in the form 2**N1 * 3**N2 * 5**N3...

  • nmin -- if not None, the integer values returned will be >=nmin

  • max_pow -- if not None, the N1, N2, N3... powers (for sizes in the form 2**N1 * 3**N2 * 5**N3) will be at max equal to this value, which allows to reduce the number of generated sizes while testing all base radixes

  • r2r -- if r2r=4, assume we try to generate radix sizes for a DCT1 or DST1 transform, which are computed as a C2C transform of size 2N-2. If r2r=4, assume we try to generate radix sizes for a DCT4 or DST4, which are performed as a C2C transform of size n/2 (if n is even) or n (if odd).

Returns:

the numpy array of integers, sorted

pyvkfft.base.radix_gen_n(nmax, max_size, radix, ndim=None, even=False, exclude_one=True, inverted=False, nmin=None, max_pow=None, range_nd_narrow=None, min_size=0, r2r=False)

Generate a list of array shape with integers which are only multiple of powers of base integers, e.g. 2**N1 * 3**N2 * 5**N3 etc..., for each of the dimensions, and with a maximum size. Note that this can generate a large number of sizes.

Parameters:
  • nmax -- the maximum value for the length of each dimension (included)

  • max_size -- the maximum size (number of elements) for the array.

  • radix -- the list/tuple of base integers - which don't need to be primes. If None, all sizes are allowed

  • ndim -- the number of dimensions allowed. If None, 1D, 2D and 3D shapes are mixed.

  • even -- if True, only return even numbers

  • exclude_one -- if True (the default), exclude 1

  • inverted -- if True, the returned array will only include integers which are NOT in the form 2**N1 * 3**N2 * 5**N3...

  • nmin -- if not None, the integer values returned will be >=nmin

  • max_pow -- if not None, the N1, N2, N3... powers (for sizes in the form 2**N1 * 3**N2 * 5**N3) will be at max equal to this value, which allows to reduce the number of generated sizes while testing all base radixes

  • range_nd_narrow -- if a tuple of values (drel, dabs) is given, with drel within [0;1], for dimensions>1, in an array of shape (s0, s1, s2), the difference of lengths with respect to the first dimension cannot be larger than min(drel * s0, dabs). This allows to reduce the number of shapes tested. With drel=dabs=0, all dimensions must have identical lengths.

  • min_size -- the minimum size (number of elements). This can be used to separate large array tests and use a larger number of parallel process for smaller ones.

  • r2r -- if r2r=4, assume we try to generate radix sizes for a DCT1 or DST1 transform, which are computed as a C2C transform of size 2N-2. If r2r=4, assume we try to generate radix sizes for a DCT4 or DST4, which are performed as a C2C transform of size n/2 (if n is even) or n (if odd).

Returns:

the list of array shapes.

pyvkfft.base.strides_nonzero(strides)

Fix the strides for an array, if one is zero it should be set to the smallest stride between the next and previous nonzero stride.

CUDA FFT class

class pyvkfft.cuda.VkFFTApp(shape, dtype: type, ndim=None, inplace=True, stream=None, norm=1, r2c=False, dct=False, dst=False, axes=None, strides=None, tune_config=None, r2c_odd=False, verbose=False, **kwargs)

VkFFT application interface, similar to a cuFFT plan.

fft(src, dest=None)

Compute the forward FFT

Parameters:
  • src -- the source pycuda.gpuarray.GPUArray or cupy.ndarray

  • dest -- the destination GPU array. Should be None for an inplace transform

Raises:

RuntimeError -- in case of a GPU kernel launch error

Returns:

the transformed array. For a R2C inplace transform, the complex view of the array is returned.

ifft(src, dest=None)

Compute the backward FFT

Parameters:
  • src -- the source pycuda.gpuarray.GPUArray or cupy.ndarray

  • dest -- the destination GPU array. Should be None for an inplace transform

Raises:

RuntimeError -- in case of a GPU kernel launch error

Returns:

the transformed array. For a C2R inplace transform, the float view of the array is returned.

pyvkfft.cuda.cuda_compile_version(raw=False)

Get CUDA version against which pyvkfft was compiled

Parameters:

raw -- if True, return the version as X*1000+Y*10+Z

Returns:

version as X.Y.Z

pyvkfft.cuda.cuda_driver_version(raw=False)

Get CUDA driver version

Parameters:

raw -- if True, return the version as X*1000+Y*10+Z

Returns:

version as X.Y.Z

pyvkfft.cuda.cuda_runtime_version(raw=False)

Get CUDA runtime version

Parameters:

raw -- if True, return the version as X*1000+Y*10+Z

Returns:

version as X.Y.Z

pyvkfft.cuda.vkfft_max_fft_dimensions()

Get the maximum number of dimensions VkFFT can handle. This is set at compile time. VkFFT default is 4, pyvkfft sets this to 8. Note that consecutive non-transformed are collapsed into a single axis, reducing the effective number of dimensions.

Returns:

VKFFT_MAX_FFT_DIMENSIONS

pyvkfft.cuda.vkfft_version()

Get VkFFT version :return: version as X.Y.Z

OpenCL FFT class

class pyvkfft.opencl.VkFFTApp(shape, dtype: type, queue: CommandQueue, ndim=None, inplace=True, norm=1, r2c=False, dct=False, dst=False, axes=None, strides=None, tune_config=None, r2c_odd=False, verbose=False, **kwargs)

VkFFT application interface implementing a FFT plan.

fft(src: Array, dest: Array = None, queue: CommandQueue = None)

Compute the forward FFT

Parameters:
  • src -- the source pyopencl Array

  • dest -- the destination pyopencl Array. Should be None for an inplace transform

  • queue -- the pyopencl CommandQueue to use for the transform. If not given, the queue of the source array is used. If the queue does not match the application's, a warning is emitted (see config.WARN_OPENCL_QUEUE_MISMATCH). If a queue is not supplied and the source and destination arrays do not have the same queue, then a RuntimeError is raised.

Raises:

RuntimeError -- in case of a GPU kernel launch error

Returns:

the transformed array. For a R2C inplace transform, the complex view of the array is returned.

ifft(src: Array, dest: Array = None, queue: CommandQueue = None)

Compute the backward FFT

Parameters:
  • src -- the source pyopencl.Array

  • dest -- the destination pyopencl.Array. Can be None for an inplace transform

  • queue -- the pyopencl CommandQueue to use for the transform. If not given, the queue of the source array is used. If the queue does not match the application, a warning is emitted (see config.WARN_OPENCL_QUEUE_MISMATCH). If a queue is not supplied and the source and destination arrays do not have the same queue, then a RuntimeError is raised.

Raises:

RuntimeError -- in case of a GPU kernel launch error

Returns:

the transformed array. For a C2R inplace transform, the float view of the array is returned.

pyvkfft.opencl.vkfft_max_fft_dimensions()

Get the maximum number of dimensions VkFFT can handle. This is set at compile time. VkFFT default is 4, pyvkfft sets this to 8. Note that consecutive non-transformed are collapsed into a single axis, reducing the effective number of dimensions.

Returns:

VKFFT_MAX_FFT_DIMENSIONS

pyvkfft.opencl.vkfft_version()

Get VkFFT version

Returns:

version as X.Y.Z

Auto-tuning functions

pyvkfft.tune.tune_vkfft(tune, shape, dtype: type, ndim=None, inplace=True, stream=None, queue=None, norm=1, r2c=False, dct=False, dst=False, axes=None, strides=None, verbose=False, r2c_odd=False, **kwargs)

Automatically test different configurations for a VkFFTApp, returning the set of parameters which maximise the FT throughput. The three parameters which are recommended to optimise are aimThreads, warpSize and coalescedMemory. Usually tuning a single one should suffice, but the right one could depend on the backend and GPU brand.

Note that the GPU context must have been initialised before calling this function.

Parameters:
  • tune --

    dictionary including the backend used and the parameter values which will be tested. This is EXPERIMENTAL, as wrong parameters may lead to crashes. Note that this will allocate temporary GPU arrays, unless the arrays to used have been passed as parameters ('dest' and 'src'). Examples:

    • tune={'backend':'cupy'} - minimal example, will automatically test a small set of parameters (4 to 10 tests). Recommended !

    • tune={'backend':'pycuda', 'warpSize':[8,16,32,64,128]}: this will test 5 possible values for the warpSize.

    • tune={'backend':'pyopencl', 'aimThreads':[32,64,128,256]}: this will test 5 possible values for the warpSize.

    • tune={'backend':'cupy', 'groupedBatch':[[-1,-1,-1],[8,8,8], [4,16,16}: this will test 3 possible values for groupedBatch. This one is more tricky to use.

    • tune={'backend':'cupy, 'warpSize':[8,16,32,64,128], 'src':a}: this will test 5 possible values for the warpSize, with a given source GPU array. This would only be valid for an inplace transform as no destination array is given.

  • shape -- the shape of the array to be transformed. The number of dimensions of the array can be larger than the FFT dimensions, but only for 1D and 2D transforms. 3D FFT transforms can only be done on 3D arrays.

  • dtype -- the numpy dtype of the source array (can be complex64 or complex128)

  • ndim -- the number of dimensions to use for the FFT. By default, uses the array dimensions. Can be smaller, e.g. ndim=2 for a 3D array to perform a batched 3D FFT on all the layers. The FFT is always performed along the last axes if the array's number of dimension is larger than ndim, i.e. on the x-axis for ndim=1, on the x and y axes for ndim=2.

  • inplace -- if True (the default), performs an inplace transform and the destination array should not be given in fft() and ifft().

  • stream -- the pycuda.driver.Stream or cupy.cuda.Stream to use for the transform. This can also be the pointer/handle (int) to the cuda stream object. If None, the default stream will be used.

  • queue -- the pyopencl CommandQueue to use for the transform.

  • norm -- if 0 (unnormalised), every transform multiplies the L2 norm of the array by its size (or the size of the transformed array if ndim<d.ndim). if 1 (the default) or "backward", the inverse transform divides the L2 norm by the array size, so FFT+iFFT will keep the array norm. if "ortho", each transform will keep the L2 norm, but that will involve an extra read & write operation.

  • r2c -- if True, will perform a real->complex transform, where the complex destination is a half-hermitian array. For an inplace transform, if the input data shape is (...,nx), the input float array should have a shape of (..., nx+2), the last two columns being ignored in the input data, and the resulting complex array (using pycuda's GPUArray.view(dtype=np.complex64) to reinterpret the type) will have a shape (..., nx//2 + 1). For an out-of-place transform, if the input (real) shape is (..., nx), the output (complex) shape should be (..., nx//2+1). Note that for C2R transforms with ndim>=2, the source (complex) array is modified.

  • dct -- used to perform a Direct Cosine Transform (DCT) aka a R2R transform. An integer can be given to specify the type of DCT (1, 2, 3 or 4). if dct=True, the DCT type 2 will be performed, following scipy's convention.

  • dst -- used to perform a Direct Cosine Transform (DST) aka a R2R transform. An integer can be given to specify the type of DST (1, 2, 3 or 4). if dst=True, the DST type 2 will be performed, following scipy's convention.

  • axes -- a list or tuple of axes along which the transform should be made. if None, the transform is done along the ndim fastest axes, or all axes if ndim is None. For R2C transforms, the fast axis must be transformed.

  • strides -- the array strides - needed if not C-ordered.

  • verbose -- if True, print speed for each configuration

  • r2c_odd --

    this should be set to True to perform an inplace r2c/c2r transform with an odd-sized fast (x) axis. Explanation: to perform a 1D inplace transform of an array with 100

    elements, the input array should have a 100+2 size, resulting in a half-Hermitian array of size 51. If the input data has a size of 101, the input array should also be padded to 102 (101+1), and the resulting half-Hermitian array also has a size of 51. A flag is thus needed to differentiate the cases of 100+2 or 101+1.

Param:

extra parameters passed on to VkFFT

Raises:

RuntimeError -- if the optimisation fails

Returns:

(kw, res) where kw are the optimal kwargs which can be passed to the VkFFTApp creation routine, and res is the full set of results for the different configurations tested.

Configuration module

Global configuration variables. The approach is adapted from Numba's config.py

pyvkfft.config.FFT_CACHE_NB = 32

Number of VkFFTApp to cache through the pyvkfft.fft interface This must be modified before importing pyvkfft.fft.

pyvkfft.config.USE_LUT = None

Force using a LUT for single-precision transforms ? If None, this will be activated automatically for some GPU (Intel).

Use only to improve the accuracy by a factor 3 or 4.

If useLUT is passed directly to a VkFFTApp, this is ignored

Valid values: either None or 1

pyvkfft.config.WARN_OPENCL_QUEUE_MISMATCH = True

Emit warning it the queue is different in the application and the array