Benchmark, accuracy and test API
Here are the object and functions used to test pyvkfft
:
accuracy testing vs scipy
benchmark transforms for opencl, cuda pyvkfft interfaces, also comparing with cufft (scikit-cuda) and clfft (gpyfft)
test module
Accuracy module
- pyvkfft.accuracy.exhaustive_test(backend, vn, ndim, dtype, inplace, norm, use_lut, r2c=False, dct=False, dst=False, nproc=None, verbose=True, return_res=False)
Run tests on a large range of sizes using multiprocessing. Manual function.
- Parameters:
backend -- either 'pyopencl', 'pycuda' or 'cupy'
vn -- the list/iterable of sizes n.
ndim -- the number of dimensions. The array shape will be [n]*ndim
dtype -- either np.complex64 or np.complex128, or np.float32/np.float64 for r2c/dct/dst
inplace -- True or False
norm -- either 0, 1 or "ortho"
use_lut -- if True,1, False or 0, will trigger useLUT=1 or 0 for VkFFT. If None, the default VkFFT behaviour is used. Always True by default for double precision, so no need to force it.
r2c -- if True, test an r2c transform. If inplace, the last dimension (x, fastest axis) must be even
dct -- either 1, 2, 3 or 4 to test different dct. Only norm=1 is can be tested (native scipy normalisation).
dst -- either 1, 2, 3 or 4 to test different dst. Only norm=1 is can be tested (native scipy normalisation).
nproc -- the maximum number of parallel process to use. If None, the number of detected cores will be used (this may use too much memory !)
verbose -- if True, prints 1 line per test
return_res -- if True, return the list of result dictionaries.
- Returns:
True if all tests passed, False otherwise. If return_res is True, return the list of result dictionaries instead.
- pyvkfft.accuracy.l2(a, b)
L2 norm
- pyvkfft.accuracy.li(a, b)
Linf norm
- pyvkfft.accuracy.test_accuracy(backend, shape, ndim, axes, dtype, inplace, norm, use_lut, r2c=False, dct=False, dst=False, gpu_name=None, opencl_platform=None, stream=None, queue=None, return_array=False, init_array=None, verbose=False, colour_output=False, ref_long_double=True, order='C')
Measure the FT accuracy by comparing to the result from scipy (if available), or numpy.
- Parameters:
backend -- either 'pyopencl', 'pycuda' or 'cupy'
shape -- the shape of the array to test. If this is an inplace r2c, the fast-axis length must be even, and two extra values will be appended along x, so the actual transform shape is the one supplied
ndim -- the number of FFT dimensions. Can be None if axes is given
axes -- the transform axes. Supersedes ndim
dtype -- either np.complex64 or np.complex128, or np.float32/np.float64 for r2c & dct
inplace -- if True, make an inplace transform. Note that for inplace r2c transforms, the size for the last (x, fastest) axis must be even.
norm -- either 0, 1 or "ortho"
use_lut -- if True,1, False or 0, will trigger useLUT=1 or 0 for VkFFT. If None, the default VkFFT behaviour is used.
r2c -- if True, test an r2c transform. If inplace, the last dimension (x, fastest axis) must be even
dct -- either 1, 2, 3 or 4 to test different dct. Only norm=1 is can be tested (native scipy normalisation).
dst -- either 1, 2, 3 or 4 to test different dst. Only norm=1 is can be tested (native scipy normalisation).
gpu_name -- the name of the gpu to use. If None, the first available for the backend will be used.
opencl_platform -- the name of the OpenCL platform to use. If None, the first available will be used.
stream -- the cuda stream to use, or None
queue -- the opencl queue to use (mandatory for the 'pyopencl' backend)
return_array -- if True, will return the generated random array so it can be re-used for different parameters
init_array -- the initial (numpy) random array to use (should be filled with uniform random numbers between +/-0.5 for both real and imaginary fields), to save time. The correct type will be applied. If None, a random array is generated.
verbose -- if True, print a 1-line info for both fft and ifft results
colour_output -- if True, use some colour to tag the quality of the accuracy
ref_long_double -- if True and scipy is available, long double precision will be used for the reference transform. Otherwise, this is ignored.
order -- either 'C' (default C-contiguous) or 'F' to test a different stride. Note that for the latter, a 3D transform on a 4D array will not be supported as the last transform axis would be on the 4th dimension (once ordered by stride).
- Returns:
a dictionary with (l2_fft, li_fft, l2_ifft, li_ifft, tol, dt_array, dt_app, dt_fft, dt_ifft, src_unchanged_fft, src_unchanged_ifft, tol_test, str), with the L2 and Linf normalised norms comparing pyvkfft's result with either numpy, scipy, the reference tolerance, and the times spent in preparing the initial random array, creating the VkFFT app, and performing the forward and backward transforms (including the GPU and reference transforms, plus the L2 and Linf computations - don't use this for benchmarking), 'src_fft_unchanged' and 'srf_ifft_unchanged' are True if for an out-of-place transform, the source array is actually unmodified (which is not true for r2c ifft with ndim>=2). The last fields are 'tol_test' which is True if both li_fft and li_ifft are smaller than tol, and str the string summarising the results (printed if verbose is True). If return_array is True, the initial random array used is returned as 'd0'. All input parameters are also returned as key/values, except stream, queue, return_array, ini_array and verbose.
Benchmark module
Benchmark functions. These are implemented using separate process, one for each test - this involves a fair amount of overhead, but avoids any resource conflict, or issue with GPU contexts, deletion of cufft plans (https://github.com/lebedov/scikit-cuda/issues/308), etc..
- pyvkfft.benchmark.run(nmin, nmax, radix_max, ndim, precision='single', nb_repeat=3, nb_loop=1, gpu_name=None, batch=True, opencl_platform=None, figsize=(16, 8), has_pyvkfft_opencl=None, has_pyvkfft_cuda=None, has_gpyfft=None, has_skcuda=None, r2c=False, dct=False, dst=False, inplace=True)
Run the benchmark, measuring the idealised memory throughput (assuming a single read+write operation per axis) for an inplace C2C transform using different fft backends available. Note that each test is made in a separate individual process, so this can take a long time.
- Parameters:
nmin -- smallest size N of the array, e.g. with a shape (batch, N, N) for a 2D transform.
nmax -- largest size N for the array.
radix_max -- maximum radix for the tested sizes. Use a large value (1e7) to test all sizes regardless of the prime decomposition.
precision -- either 'single' or 'double'
nb_repeat -- number of times each fft+ifft cycle is performed, the best timing is kept
gpu_name -- name or substring (case-insensitive) of the GPU to use. If None, the first found will be used.
batch -- if True (the default), all transforms are batched so that the array size is large enough to yield a measurable transform time. Each array takes a shape e.g. (batch, N, N) for a 2D transform.
opencl_platform -- name or substring (case-insensitive) of the OpenCL platform to use. If None, the first found will be used.
figsize -- figure size for plotting. Set to None to disable plotting.
has_pyvkfft_opencl -- if True, will test pvkfft.opencl. If None, will be automatically detected
has_pyvkfft_cuda -- if True, will test pvkfft.cuda. If None, will be automatically detected
has_gpyfft -- if True, will test gpyfft (clFFT). If None, will be automatically detected
has_skcuda -- if True, will test scikit.cuda (cuFFT). If None, will be automatically detected
r2c -- if True, test an r2c transform
dct -- test DCT of type 1,2,3 or 4
dst -- test DST of type 1,2,3 or 4
inplace -- if True, test inplace transforms
- pyvkfft.benchmark.test_gpyfft()
Test if gpyfft is available. The test is made in a separate process.
- pyvkfft.benchmark.test_pyvkfft_cuda()
Test if pyvkfft_cuda is available. The test is made in a separate process. Also return the
- pyvkfft.benchmark.test_pyvkfft_opencl()
Test if pyvkfft_opencl is available. The test is made in a separate process.