Simple FFT interface: pyvkfft.fft

Introduction

API

This is the CDI base classes, which can be used with operators

pyvkfft.fft.clear_vkfftapp_cache()

Remove all cached VkFFTApp

pyvkfft.fft.dctn(src, dest=None, ndim=None, norm=1, dct_type=2, cuda_stream=None, cl_queue=None, tune=False)

Perform a real->real Direct Cosine Transform on a GPU array, automatically creating the VkFFTApp and caching it for future re-use.

Parameters:
  • src -- the source pycuda.gpuarray.GPUArray or cupy.ndarray

  • dest -- the destination GPU array. If None, a new GPU array will be created and returned (using the source array allocator (pycuda, pyopencl) if available). If dest is the same array as src, an inplace transform is done.

  • ndim -- the number of dimensions (<=3) to use for the FFT. By default, uses the array dimensions. Can be smaller, e.g. ndim=2 for a 3D array to perform a batched 3D FFT on all the layers. The FFT is always performed along the last axes if the array's number of dimension is larger than ndim, i.e. on the x-axis for ndim=1, on the x and y axes for ndim=2.

  • norm -- normalisation mode, either 0 (un-normalised) or 1 (the default, also available as "backward) which will normalise the inverse transform, so DCT+iDCT will keep the array norm.

  • dct_type -- the type of dct desired: 1, 2 (default), 3 or 4

  • cuda_stream -- the pycuda.driver.Stream or cupy.cuda.Stream to use for the transform. If None, the default one will be used

  • cl_queue -- the pyopencl.CommandQueue to be used. If None, the source array default queue will be used

  • tune -- if True, will activate the automatic tuning of VkFFT parameters to maximise the FT throughput. This uses a quick approach testing a few transforms (about 4) before choosing the optimal parameters. This is similar to FFTW's FFTW_MEASURE approach.

Returns:

the destination array.

pyvkfft.fft.dstn(src, dest=None, ndim=None, norm=1, dst_type=2, cuda_stream=None, cl_queue=None, tune=False)

Perform a real->real Direct Cosine Transform on a GPU array, automatically creating the VkFFTApp and caching it for future re-use.

Parameters:
  • src -- the source pycuda.gpuarray.GPUArray or cupy.ndarray

  • dest -- the destination GPU array. If None, a new GPU array will be created and returned (using the source array allocator (pycuda, pyopencl) if available). If dest is the same array as src, an inplace transform is done.

  • ndim -- the number of dimensions (<=3) to use for the FFT. By default, uses the array dimensions. Can be smaller, e.g. ndim=2 for a 3D array to perform a batched 3D FFT on all the layers. The FFT is always performed along the last axes if the array's number of dimension is larger than ndim, i.e. on the x-axis for ndim=1, on the x and y axes for ndim=2.

  • norm -- normalisation mode, either 0 (un-normalised) or 1 (the default, also available as "backward) which will normalise the inverse transform, so DST+iDST will keep the array norm.

  • dst_type -- the type of dst desired: 1, 2 (default), 3 or 4

  • cuda_stream -- the pycuda.driver.Stream or cupy.cuda.Stream to use for the transform. If None, the default one will be used

  • cl_queue -- the pyopencl.CommandQueue to be used. If None, the source array default queue will be used

  • tune -- if True, will activate the automatic tuning of VkFFT parameters to maximise the FT throughput. This uses a quick approach testing a few transforms (about 4) before choosing the optimal parameters. This is similar to FFTW's FFTW_MEASURE approach.

Returns:

the destination array.

pyvkfft.fft.fftn(src, dest=None, ndim=None, norm=1, axes=None, cuda_stream=None, cl_queue=None, return_scale=False, tune=False)

Perform a FFT on a GPU array, automatically creating the VkFFTApp and caching it for future re-use.

Parameters:
  • src -- the source pycuda.gpuarray.GPUArray or cupy.ndarray

  • dest -- the destination GPU array. If None, a new GPU array will be created and returned (using the source array allocator (pycuda, pyopencl) if available). If dest is the same array as src, an inplace transform is done.

  • ndim -- the number of dimensions (<=3) to use for the FFT. By default, uses the array dimensions. Can be smaller, e.g. ndim=2 for a 3D array to perform a batched 3D FFT on all the layers. The FFT is always performed along the last axes if the array's number of dimension is larger than ndim, i.e. on the x-axis for ndim=1, on the x and y axes for ndim=2.

  • norm -- if 0 (un-normalised), every transform multiplies the L2 norm of the array by the transform size. if 1 (the default) or "backward", the inverse transform divides the L2 norm by the array size, so FFT+iFFT will keep the array norm. if "ortho", each transform will keep the L2 norm, but that will involve an extra read & write operation.

  • axes -- a list or tuple of axes along which the transform is made. if None, the transform is done along the ndim fastest axes, or all axes if ndim is None. Not allowed for R2C transforms

  • cuda_stream -- the pycuda.driver.Stream or cupy.cuda.Stream to use for the transform. If None, the default one will be used

  • cl_queue -- the pyopencl.CommandQueue to be used. If None, the source array default queue will be used

  • return_scale -- if True, return the scale factor by which the result must be multiplied to keep its L2 norm after the transform

  • tune -- if True, will activate the automatic tuning of VkFFT parameters to maximise the FT throughput. This uses a quick approach testing a few transforms (about 4) before choosing the optimal parameters. This is similar to FFTW's FFTW_MEASURE approach.

Returns:

the destination array if return_scale is False, or (dest, scale)

pyvkfft.fft.idctn(src, dest=None, ndim=None, norm=1, dct_type=2, cuda_stream=None, cl_queue=None, tune=False)

Perform a real->real inverse Direct Cosine Transform on a GPU array, automatically creating the VkFFTApp and caching it for future re-use.

Parameters:
  • src -- the source pycuda.gpuarray.GPUArray or cupy.ndarray

  • dest -- the destination GPU array. If None, a new GPU array will be created and returned (using the source array allocator (pycuda, pyopencl) if available). If dest is the same array as src, an inplace transform is done.

  • ndim -- the number of dimensions (<=3) to use for the FFT. By default, uses the array dimensions. Can be smaller, e.g. ndim=2 for a 3D array to perform a batched 3D FFT on all the layers. The FFT is always performed along the last axes if the array's number of dimension is larger than ndim, i.e. on the x-axis for ndim=1, on the x and y axes for ndim=2.

  • norm -- normalisation mode, either 0 (un-normalised) or 1 (the default, also available as "backward) which will normalise the inverse transform, so DCT+iDCT will keep the array norm.

  • dct_type -- the type of dct desired: 2 (default), 3 or 4

  • cuda_stream -- the pycuda.driver.Stream or cupy.cuda.Stream to use for the transform. If None, the default one will be used

  • cl_queue -- the pyopencl.CommandQueue to be used. If None, the source array default queue will be used

  • tune -- if True, will activate the automatic tuning of VkFFT parameters to maximise the FT throughput. This uses a quick approach testing a few transforms (about 4) before choosing the optimal parameters. This is similar to FFTW's FFTW_MEASURE approach.

Returns:

the destination array.

pyvkfft.fft.idstn(src, dest=None, ndim=None, norm=1, dst_type=2, cuda_stream=None, cl_queue=None, tune=False)

Perform a real->real inverse Direct Cosine Transform on a GPU array, automatically creating the VkFFTApp and caching it for future re-use.

Parameters:
  • src -- the source pycuda.gpuarray.GPUArray or cupy.ndarray

  • dest -- the destination GPU array. If None, a new GPU array will be created and returned (using the source array allocator (pycuda, pyopencl) if available). If dest is the same array as src, an inplace transform is done.

  • ndim -- the number of dimensions (<=3) to use for the FFT. By default, uses the array dimensions. Can be smaller, e.g. ndim=2 for a 3D array to perform a batched 3D FFT on all the layers. The FFT is always performed along the last axes if the array's number of dimension is larger than ndim, i.e. on the x-axis for ndim=1, on the x and y axes for ndim=2.

  • norm -- normalisation mode, either 0 (un-normalised) or 1 (the default, also available as "backward) which will normalise the inverse transform, so DST+iDST will keep the array norm.

  • dst_type -- the type of dst desired: 2 (default), 3 or 4

  • cuda_stream -- the pycuda.driver.Stream or cupy.cuda.Stream to use for the transform. If None, the default one will be used

  • cl_queue -- the pyopencl.CommandQueue to be used. If None, the source array default queue will be used

  • tune -- if True, will activate the automatic tuning of VkFFT parameters to maximise the FT throughput. This uses a quick approach testing a few transforms (about 4) before choosing the optimal parameters. This is similar to FFTW's FFTW_MEASURE approach.

Returns:

the destination array.

pyvkfft.fft.ifftn(src, dest=None, ndim=None, norm=1, axes=None, cuda_stream=None, cl_queue=None, return_scale=False, tune=False)

Perform an inverse FFT on a GPU array, automatically creating the VkFFTApp and caching it for future re-use.

Parameters:
  • src -- the source pycuda.gpuarray.GPUArray or cupy.ndarray

  • dest -- the destination GPU array. If None, a new GPU array will be created and returned (using the source array allocator (pycuda, pyopencl) if available). If dest is the same array as src, an inplace transform is done.

  • ndim -- the number of dimensions (<=3) to use for the FFT. By default, uses the array dimensions. Can be smaller, e.g. ndim=2 for a 3D array to perform a batched 3D FFT on all the layers. The FFT is always performed along the last axes if the array's number of dimension is larger than ndim, i.e. on the x-axis for ndim=1, on the x and y axes for ndim=2.

  • norm -- if 0 (un-normalised), every transform multiplies the L2 norm of the array by the transform size. if 1 (the default) or "backward", the inverse transform divides the L2 norm by the array size, so FFT+iFFT will keep the array norm. if "ortho", each transform will keep the L2 norm, but that will involve an extra read & write operation.

  • axes -- a list or tuple of axes along which the transform is made. if None, the transform is done along the ndim fastest axes, or all axes if ndim is None. Not allowed for R2C transforms

  • cuda_stream -- the pycuda.driver.Stream or cupy.cuda.Stream to use for the transform. If None, the default one will be used

  • cl_queue -- the pyopencl.CommandQueue to be used. If None, the source array default queue will be used

  • return_scale -- if True, return the scale factor by which the result must be multiplied to keep its L2 norm after the transform

  • tune -- if True, will activate the automatic tuning of VkFFT parameters to maximise the FT throughput. This uses a quick approach testing a few transforms (about 4) before choosing the optimal parameters. This is similar to FFTW's FFTW_MEASURE approach.

Returns:

the destination array if return_scale is False, or (dest, scale)

pyvkfft.fft.irfftn(src, dest=None, ndim=None, norm=1, cuda_stream=None, cl_queue=None, return_scale=False, tune=False, r2c_odd=False)

Perform a complex->real transform on a GPU array, automatically creating the VkFFTApp and caching it for future re-use. For an out-of-place transform, the length of the destination last axis will be (src.shape[-1]-1)*2. For an in-place transform, if the src complex array has a shape (..., nx), the destination (real) array will have a shape of (..., nx*2), but the last one (if r2c_odd=True) or two values along the x-axis are used as buffer: the size of the transform is thus either nx*2 or nx*2+1.

Parameters:
  • src -- the source pycuda.gpuarray.GPUArray or cupy.ndarray

  • dest -- the destination GPU array. If None, a new GPU array will be created and returned (using the source array allocator (pycuda, pyopencl) if available). If dest is the same array as src, an inplace transform is done.

  • ndim -- the number of dimensions (<=3) to use for the FFT. By default, uses the array dimensions. Can be smaller, e.g. ndim=2 for a 3D array to perform a batched 3D FFT on all the layers. The FFT is always performed along the last axes if the array's number of dimension is larger than ndim, i.e. on the x-axis for ndim=1, on the x and y axes for ndim=2.

  • norm -- if 0 (un-normalised), every transform multiplies the L2 norm of the array by the transform size. if 1 (the default) or "backward", the inverse transform divides the L2 norm by the array size, so FFT+iFFT will keep the array norm. if "ortho", each transform will keep the L2 norm, but that will involve an extra read & write operation.

  • cuda_stream -- the pycuda.driver.Stream or cupy.cuda.Stream to use for the transform. If None, the default one will be used

  • cl_queue -- the pyopencl.CommandQueue to be used. If None, the source array default queue will be used

  • return_scale -- if True, return the scale factor by which the result must be multiplied to keep its L2 norm after the transform

  • tune -- if True, will activate the automatic tuning of VkFFT parameters to maximise the FT throughput. This uses a quick approach testing a few transforms (about 4) before choosing the optimal parameters. This is similar to FFTW's FFTW_MEASURE approach.

  • r2c_odd -- should be set to True for an in-place r2c transform where the actual data length (in the real array) is odd along the fast axis. This parameter is ignored otherwise.

Returns:

the destination array if return_scale is False, or (dest, scale) For an in-place transform, the returned value is a view of the array with the appropriate type.

pyvkfft.fft.rfftn(src, dest=None, ndim=None, norm=1, cuda_stream=None, cl_queue=None, return_scale=False, tune=False, r2c_odd=False)

Perform a real->complex transform on a GPU array, automatically creating the VkFFTApp and caching it for future re-use. For an out-of-place transform, the length of the destination last axis will be src.shape[-1]//2+1. For an in-place transform with an even [respectively odd]-sized fast (x) axis, the src array should have a shape (..., nx+2) [respectively (..., nx+1)], the last one or two values along the fast (x) axis are ignored, and the destination array will have a shape of (..., nx//2+1). An in-place transform with an odd-sized x-axis requires r2c_odd=True

Parameters:
  • src -- the source pycuda.gpuarray.GPUArray or cupy.ndarray

  • dest -- the destination GPU array. If None, a new GPU array will be created and returned (using the source array allocator (pycuda, pyopencl) if available). If dest is the same array as src, an inplace transform is done.

  • ndim -- the number of dimensions (<=3) to use for the FFT. By default, uses the array dimensions. Can be smaller, e.g. ndim=2 for a 3D array to perform a batched 3D FFT on all the layers. The FFT is always performed along the last axes if the array's number of dimension is larger than ndim, i.e. on the x-axis for ndim=1, on the x and y axes for ndim=2.

  • norm -- if 0 (un-normalised), every transform multiplies the L2 norm of the array by the transform size. if 1 (the default) or "backward", the inverse transform divides the L2 norm by the array size, so FFT+iFFT will keep the array norm. if "ortho", each transform will keep the L2 norm, but that will involve an extra read & write operation.

  • cuda_stream -- the pycuda.driver.Stream or cupy.cuda.Stream to use for the transform. If None, the default one will be used

  • cl_queue -- the pyopencl.CommandQueue to be used. If None, the source array default queue will be used

  • return_scale -- if True, return the scale factor by which the result must be multiplied to keep its L2 norm after the transform

  • tune -- if True, will activate the automatic tuning of VkFFT parameters to maximise the FT throughput. This uses a quick approach testing a few transforms (about 4) before choosing the optimal parameters. This is similar to FFTW's FFTW_MEASURE approach.

  • r2c_odd -- should be set to True for an in-place r2c transform where the actual data length is odd along the fast axis. This parameter is ignored otherwise.

Returns:

the destination array if return_scale is False, or (dest, scale). For an in-place transform, the returned value is a view of the array with the appropriate type.

pyvkfft.fft.vkfft_version()

Get VkFFT version

Returns:

version as X.Y.Z