MATLAB/octave interfaces

Note

See the installation page for how to build these interfaces, or look here.

Quick-start examples

To demo a single 1D transform of type 1 (nonuniform points to uniform Fourier coefficients), we set up random data then do the transform as follows:

M = 1e5;                            % number of NU source points
x = 2*pi*rand(M,1);                 % points in a 2pi-periodic domain
c = randn(M,1)+1i*randn(M,1);       % iid random complex data (row or col vec)
N = 2e5;                            % how many desired Fourier modes?
f = finufft1d1(x,c,+1,1e-12,N);     % do it (takes around 0.02 sec)

The column vector output f should be interpreted as the Fourier coefficients with frequency indices k = -N/2:N/2-1. (This is because N is even; otherwise k = -(N-1)/2:(N-1)/2.) The values in f are accurate (relative to this vector’s 2-norm) to roughly 12 digits, as requested by the tolerance argument 1e-12. Choosing a larger (ie, worse) tolerance leads to faster transforms. The +1 controls the sign in the exponential; recall equation (1). All options maybe be changed from their defaults, for instance:

o.modeord = 1;                      % choose FFT-style output mode ordering
f = finufft1d1(x,c,+1,1e-12,N,o);   % do it

The above usage we call the “simple” interface. There is also a “vectorized” interface which does the transform for multiple stacked strength vectors, using the same nonuniform points each time. We demo this, reusing x and N from above:

ntr = 1e2;                          % number of vectors (transforms to do)
C = randn(M,ntr)+1i*randn(M,ntr);   % iid random complex data (matrix)
F = finufft1d1(x,C,+1,1e-12,N);     % do them (takes around 1.2 sec)

Here this is nearly twice as fast as doing 100 separate calls to the simple interface. For smaller transform sizes the acceleration factor of this vectorized call can be much higher.

If you want yet more control, consider using the “guru” interface. This can be faster than fresh calls to the simple or vectorized interfaces for the same number of transforms, for reasons such as this: the nonuniform points can be changed between transforms, without forcing FFTW to look up a previously stored plan. Usually, such an acceleration is only important when doing repeated small transforms, where “small” means each transform takes of order 0.01 sec or less. Here we use the guru interface to repeat the first demo above:

type = 1; ntr = 1; o.modeord = 1;   % transform type, #transforms, opts
N = 2e5;                            % how many desired Fourier modes?
plan = finufft_plan(1,N,+1,ntr,1e-12,o);      % plan for N output modes
M = 1e5;                            % number of NU source points
x = 2*pi*rand(M,1);                 % array of NU source points
plan.setpts(x,[],[]);               % pass pointer to this array (M inferred)
% (note: the x array should now not be altered until all executes are done!)
c = randn(M,1)+1i*randn(M,1);       % iid random complex data (row or col vec)
f = plan.execute(c);                % do the transform (0.008 sec, ie, faster)
% ...one could now change the points with setpts, and/or do new transforms
% with new c data...
delete(plan);                       % don't forget to clean up

Warning

If an existing array is passed to setpts, then this array must not be altered before execute is called! This is because, in order to save RAM (allowing larger problems to be solved), internally FINUFFT stores only pointers to x (etc), rather than unnecessarily duplicating this data. This is not true if an expression such as -x or 2*pi*rand(M,1) is passed to setpts, since in those cases the plan object does make internal copies, as per MATLAB’s usual shallow-copy argument passing.

Finally, we demo a 2D type 1 transform using the simple interface. Let’s request a rectangular Fourier mode array of 1000 modes in the x direction but 500 in the y direction. The source points are in the square of side length \(2\pi\):

M = 1e6; x = 2*pi*rand(1,M); y = 2*pi*rand(1,M);     % points in [0,2pi]^2
c = randn(M,1)+1i*randn(M,1);       % iid random complex data (row or col vec)
N1 = 1000; N2 = 500;                % desired Fourier mode array sizes
f = finufft2d1(x,y,c,+1,1e-9,N1,N2);          % do it (takes around 0.08 sec)

The resulting output f is indeed size 1000 by 500. The first dimension (number of rows) corresponds to the x input coordinate, and the second to y.

If you need to change the definition of the period from \(2\pi\), simply linearly rescale your points before sending them to FINUFFT.

Note

Under the hood FINUFFT has double- and single-precision libraries. The simple and vectorized MATLAB/octave interfaces infer which to call by checking the class of its input arrays, which must all match (ie, all must be double or all must be single). Since by default MATLAB arrays are double-precision, this is the precision that all of the above examples run in. To perform single-precision transforms, send in single-precision data. In contrast, precision in the guru interface is set with the finufft_plan option string o.floatprec, either 'double' (the default), or 'single'.

See tests and examples in the repo and tutorials and demos for plenty more MATLAB examples.

Full documentation

Here are the help documentation strings for all MATLAB/octave interfaces. They only abbreviate the options (for full documentation see Options parameters). Informative warnings and errors are raised in MATLAB style with unique codes (see ../matlab/errhandler.m, ../matlab/finufft.mw, and ../valid_*.m). The low-level error number codes are not used.

If you have added the matlab directory of FINUFFT correctly to your MATLAB path via something like addpath FINUFFT/matlab, then help finufft/matlab will give the summary of all commands:

% FINUFFT: Flatiron Institute Nonuniform Fast Fourier Transform
% Version 2.2.0
%
% Basic and many-vector interfaces
%   finufft1d1 - 1D complex nonuniform FFT of type 1 (nonuniform to uniform).
%   finufft1d2 - 1D complex nonuniform FFT of type 2 (uniform to nonuniform).
%   finufft1d3 - 1D complex nonuniform FFT of type 3 (nonuniform to nonuniform).
%   finufft2d1 - 2D complex nonuniform FFT of type 1 (nonuniform to uniform).
%   finufft2d2 - 2D complex nonuniform FFT of type 2 (uniform to nonuniform).
%   finufft2d3 - 2D complex nonuniform FFT of type 3 (nonuniform to nonuniform).
%   finufft3d1 - 3D complex nonuniform FFT of type 1 (nonuniform to uniform).
%   finufft3d2 - 3D complex nonuniform FFT of type 2 (uniform to nonuniform).
%   finufft3d3 - 3D complex nonuniform FFT of type 3 (nonuniform to nonuniform).
%
% Guru interface
%   finufft_plan - create guru plan object for one/many general nonuniform FFTs.
%   finufft_plan.setpts   - process nonuniform points for general transform(s).
%   finufft_plan.execute  - single or many-vector transforms in a plan.

The individual commands have the following help documentation:

FINUFFT1D1   1D complex nonuniform FFT of type 1 (nonuniform to uniform).

f = finufft1d1(x,c,isign,eps,ms)
f = finufft1d1(x,c,isign,eps,ms,opts)

This computes, to relative precision eps, via a fast algorithm:

              nj
    f(k1) =  SUM c[j] exp(+/-i k1 x(j))  for -ms/2 <= k1 <= (ms-1)/2
             j=1
  Inputs:
    x     locations of nonuniform sources on interval [-3pi,3pi), length nj
    c     length-nj complex vector of source strengths. If numel(c)>nj,
          expects a stack of vectors (eg, a nj*ntrans matrix) each of which is
          transformed with the same source locations.
    isign if >=0, uses + sign in exponential, otherwise - sign.
    eps   relative precision requested (generally between 1e-15 and 1e-1)
    ms    number of Fourier modes computed, may be even or odd;
          in either case, mode range is integers lying in [-ms/2, (ms-1)/2]
    opts   optional struct with optional fields controlling the following:
    opts.debug:   0 (silent, default), 1 (timing breakdown), 2 (debug info).
    opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots)
    opts.spread_sort:  0 (don't sort NU pts), 1 (do), 2 (auto, default)
    opts.spread_kerevalmeth:  0: exp(sqrt()), 1: Horner ppval (faster)
    opts.spread_kerpad: (iff kerevalmeth=0)  0: don't pad to mult of 4, 1: do
    opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc
    opts.upsampfac:   sigma.  2.0 (default), or 1.25 (low RAM, smaller FFT)
    opts.spread_thread:   for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc
    opts.maxbatchsize:  for ntrans>1 only. max blocking size, or 0 for auto.
    opts.nthreads:   number of threads, or 0: use all available (default)
    opts.modeord: 0 (CMCL increasing mode ordering, default), 1 (FFT ordering)
    opts.chkbnds: 0 (don't check NU points valid), 1 (do, default)
  Outputs:
    f     size-ms complex column vector of Fourier coefficients, or, if
          ntrans>1, a matrix of size (ms,ntrans).

Notes:
 * The vectorized (many vector) interface, ie ntrans>1, can be much faster
   than repeated calls with the same nonuniform points. Note that here the I/O
   data ordering is stacked rather than interleaved. See ../docs/matlab.rst
 * The class of input x (double vs single) controls whether the double or
   single precision library are called; precisions of all data should match.
 * For more details about the opts fields, see ../docs/opts.rst
 * See ERRHANDLER, VALID_* and FINUFFT_PLAN for possible warning/error IDs.
 * Full documentation is online at http://finufft.readthedocs.io
FINUFFT1D2   1D complex nonuniform FFT of type 2 (uniform to nonuniform).

c = finufft1d2(x,isign,eps,f)
c = finufft1d2(x,isign,eps,f,opts)

This computes, to relative precision eps, via a fast algorithm:

   c[j] = SUM   f[k1] exp(+/-i k1 x[j])      for j = 1,...,nj
           k1
    where sum is over -ms/2 <= k1 <= (ms-1)/2.

 Inputs:
    x     location of nonuniform targets on interval [-3pi,3pi), length nj
    f     complex Fourier coefficients. If a vector, length sets ms
          (with mode ordering given by opts.modeord). If a matrix, each
          of ntrans columns is transformed with the same nonuniform targets.
    isign if >=0, uses + sign in exponential, otherwise - sign.
    eps   relative precision requested (generally between 1e-15 and 1e-1)
    opts   optional struct with optional fields controlling the following:
    opts.debug:   0 (silent, default), 1 (timing breakdown), 2 (debug info).
    opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots)
    opts.spread_sort:  0 (don't sort NU pts), 1 (do), 2 (auto, default)
    opts.spread_kerevalmeth:  0: exp(sqrt()), 1: Horner ppval (faster)
    opts.spread_kerpad: (iff kerevalmeth=0)  0: don't pad to mult of 4, 1: do
    opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc
    opts.upsampfac:   sigma.  2.0 (default), or 1.25 (low RAM, smaller FFT)
    opts.spread_thread:   for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc
    opts.maxbatchsize:  for ntrans>1 only. max blocking size, or 0 for auto.
    opts.nthreads:   number of threads, or 0: use all available (default)
    opts.modeord: 0 (CMCL increasing mode ordering, default), 1 (FFT ordering)
    opts.chkbnds: 0 (don't check NU points valid), 1 (do, default)
 Outputs:
    c     complex column vector of nj answers at targets, or,
          if ntrans>1, matrix of size (nj,ntrans).

Notes:
 * The vectorized (many vector) interface, ie ntrans>1, can be much faster
   than repeated calls with the same nonuniform points. Note that here the I/O
   data ordering is stacked rather than interleaved. See ../docs/matlab.rst
 * The class of input x (double vs single) controls whether the double or
   single precision library are called; precisions of all data should match.
 * For more details about the opts fields, see ../docs/opts.rst
 * See ERRHANDLER, VALID_* and FINUFFT_PLAN for possible warning/error IDs.
 * Full documentation is online at http://finufft.readthedocs.io
FINUFFT1D3   1D complex nonuniform FFT of type 3 (nonuniform to nonuniform).

f = finufft1d3(x,c,isign,eps,s)
f = finufft1d3(x,c,isign,eps,s,opts)

This computes, to relative precision eps, via a fast algorithm:

             nj
    f[k]  =  SUM   c[j] exp(+-i s[k] x[j]),      for k = 1, ..., nk
             j=1
  Inputs:
    x     locations of nonuniform sources on R (real line), length-nj vector.
    c     length-nj complex vector of source strengths. If numel(c)>nj,
          expects a stack of vectors (eg, a nj*ntrans matrix) each of which is
          transformed with the same source and target locations.
    isign if >=0, uses + sign in exponential, otherwise - sign.
    eps   relative precision requested (generally between 1e-15 and 1e-1)
    s     frequency locations of nonuniform targets on R, length-nk vector.
    opts   optional struct with optional fields controlling the following:
    opts.debug:   0 (silent, default), 1 (timing breakdown), 2 (debug info).
    opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots)
    opts.spread_sort:  0 (don't sort NU pts), 1 (do), 2 (auto, default)
    opts.spread_kerevalmeth:  0: exp(sqrt()), 1: Horner ppval (faster)
    opts.spread_kerpad: (iff kerevalmeth=0)  0: don't pad to mult of 4, 1: do
    opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc
    opts.upsampfac:   sigma.  2.0 (default), or 1.25 (low RAM, smaller FFT)
    opts.spread_thread:   for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc
    opts.maxbatchsize:  for ntrans>1 only. max blocking size, or 0 for auto.
    opts.nthreads:   number of threads, or 0: use all available (default)
  Outputs:
    f     length-nk complex vector of values at targets, or, if ntrans>1,
          a matrix of size (nk,ntrans)

Notes:
 * The vectorized (many vector) interface, ie ntrans>1, can be much faster
   than repeated calls with the same nonuniform points. Note that here the I/O
   data ordering is stacked rather than interleaved. See ../docs/matlab.rst
 * The class of input x (double vs single) controls whether the double or
   single precision library are called; precisions of all data should match.
 * For more details about the opts fields, see ../docs/opts.rst
 * See ERRHANDLER, VALID_* and FINUFFT_PLAN for possible warning/error IDs.
 * Full documentation is online at http://finufft.readthedocs.io
FINUFFT2D1   2D complex nonuniform FFT of type 1 (nonuniform to uniform).

f = finufft2d1(x,y,c,isign,eps,ms,mt)
f = finufft2d1(x,y,c,isign,eps,ms,mt,opts)

This computes, to relative precision eps, via a fast algorithm:

                  nj
    f[k1,k2] =   SUM  c[j] exp(+-i (k1 x[j] + k2 y[j]))
                 j=1

    for -ms/2 <= k1 <= (ms-1)/2,  -mt/2 <= k2 <= (mt-1)/2.

  Inputs:
    x,y   coordinates of nonuniform sources on the square [-3pi,3pi)^2,
          each a length-nj vector
    c     length-nj complex vector of source strengths. If numel(c)>nj,
          expects a stack of vectors (eg, a nj*ntrans matrix) each of which is
          transformed with the same source locations.
    isign if >=0, uses + sign in exponential, otherwise - sign.
    eps   relative precision requested (generally between 1e-15 and 1e-1)
    ms,mt  number of Fourier modes requested in x & y; each may be even or odd.
           In either case the mode range is integers lying in [-m/2, (m-1)/2]
    opts   optional struct with optional fields controlling the following:
    opts.debug:   0 (silent, default), 1 (timing breakdown), 2 (debug info).
    opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots)
    opts.spread_sort:  0 (don't sort NU pts), 1 (do), 2 (auto, default)
    opts.spread_kerevalmeth:  0: exp(sqrt()), 1: Horner ppval (faster)
    opts.spread_kerpad: (iff kerevalmeth=0)  0: don't pad to mult of 4, 1: do
    opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc
    opts.upsampfac:   sigma.  2.0 (default), or 1.25 (low RAM, smaller FFT)
    opts.spread_thread:   for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc
    opts.maxbatchsize:  for ntrans>1 only. max blocking size, or 0 for auto.
    opts.nthreads:   number of threads, or 0: use all available (default)
    opts.modeord: 0 (CMCL increasing mode ordering, default), 1 (FFT ordering)
    opts.chkbnds: 0 (don't check NU points valid), 1 (do, default)
  Outputs:
    f     size (ms,mt) complex matrix of Fourier coefficients
          (ordering given by opts.modeord in each dimension; ms fast, mt slow),
          or, if ntrans>1, a 3D array of size (ms,mt,ntrans).

Notes:
 * The vectorized (many vector) interface, ie ntrans>1, can be much faster
   than repeated calls with the same nonuniform points. Note that here the I/O
   data ordering is stacked rather than interleaved. See ../docs/matlab.rst
 * The class of input x (double vs single) controls whether the double or
   single precision library are called; precisions of all data should match.
 * For more details about the opts fields, see ../docs/opts.rst
 * See ERRHANDLER, VALID_* and FINUFFT_PLAN for possible warning/error IDs.
 * Full documentation is online at http://finufft.readthedocs.io
FINUFFT2D2   2D complex nonuniform FFT of type 2 (uniform to nonuniform).

c = finufft2d2(x,y,isign,eps,f)
c = finufft2d2(x,y,isign,eps,f,opts)

This computes, to relative precision eps, via a fast algorithm:

   c[j] =  SUM   f[k1,k2] exp(+/-i (k1 x[j] + k2 y[j]))  for j = 1,..,nj
          k1,k2
    where sum is over -ms/2 <= k1 <= (ms-1)/2, -mt/2 <= k2 <= (mt-1)/2,

 Inputs:
    x,y   coordinates of nonuniform targets on the square [-3pi,3pi)^2,
          each a vector of length nj
    f     complex Fourier coefficient matrix, whose size determines (ms,mt).
          (Mode ordering given by opts.modeord, in each dimension.)
          If a 3D array, 3rd dimension sets ntrans, and each of ntrans
          matrices is transformed with the same nonuniform targets.
    isign if >=0, uses + sign in exponential, otherwise - sign.
    eps   relative precision requested (generally between 1e-15 and 1e-1)
    opts   optional struct with optional fields controlling the following:
    opts.debug:   0 (silent, default), 1 (timing breakdown), 2 (debug info).
    opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots)
    opts.spread_sort:  0 (don't sort NU pts), 1 (do), 2 (auto, default)
    opts.spread_kerevalmeth:  0: exp(sqrt()), 1: Horner ppval (faster)
    opts.spread_kerpad: (iff kerevalmeth=0)  0: don't pad to mult of 4, 1: do
    opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc
    opts.upsampfac:   sigma.  2.0 (default), or 1.25 (low RAM, smaller FFT)
    opts.spread_thread:   for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc
    opts.maxbatchsize:  for ntrans>1 only. max blocking size, or 0 for auto.
    opts.nthreads:   number of threads, or 0: use all available (default)
    opts.modeord: 0 (CMCL increasing mode ordering, default), 1 (FFT ordering)
    opts.chkbnds: 0 (don't check NU points valid), 1 (do, default)
 Outputs:
    c     complex column vector of nj answers at targets, or,
          if ntrans>1, matrix of size (nj,ntrans).

Notes:
 * The vectorized (many vector) interface, ie ntrans>1, can be much faster
   than repeated calls with the same nonuniform points. Note that here the I/O
   data ordering is stacked rather than interleaved. See ../docs/matlab.rst
 * The class of input x (double vs single) controls whether the double or
   single precision library are called; precisions of all data should match.
 * For more details about the opts fields, see ../docs/opts.rst
 * See ERRHANDLER, VALID_* and FINUFFT_PLAN for possible warning/error IDs.
 * Full documentation is online at http://finufft.readthedocs.io
FINUFFT2D3   2D complex nonuniform FFT of type 3 (nonuniform to nonuniform).

f = finufft2d3(x,y,c,isign,eps,s,t)
f = finufft2d3(x,y,c,isign,eps,s,t,opts)

This computes, to relative precision eps, via a fast algorithm:

             nj
    f[k]  =  SUM   c[j] exp(+-i (s[k] x[j] + t[k] y[j])),  for k = 1, ..., nk
             j=1
  Inputs:
    x,y   coordinates of nonuniform sources in R^2, each a length-nj vector.
    c     length-nj complex vector of source strengths. If numel(c)>nj,
          expects a stack of vectors (eg, a nj*ntrans matrix) each of which is
          transformed with the same source and target locations.
    isign if >=0, uses + sign in exponential, otherwise - sign.
    eps   relative precision requested (generally between 1e-15 and 1e-1)
    s,t   frequency coordinates of nonuniform targets in R^2,
          each a length-nk vector.
    opts   optional struct with optional fields controlling the following:
    opts.debug:   0 (silent, default), 1 (timing breakdown), 2 (debug info).
    opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots)
    opts.spread_sort:  0 (don't sort NU pts), 1 (do), 2 (auto, default)
    opts.spread_kerevalmeth:  0: exp(sqrt()), 1: Horner ppval (faster)
    opts.spread_kerpad: (iff kerevalmeth=0)  0: don't pad to mult of 4, 1: do
    opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc
    opts.upsampfac:   sigma.  2.0 (default), or 1.25 (low RAM, smaller FFT)
    opts.spread_thread:   for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc
    opts.maxbatchsize:  for ntrans>1 only. max blocking size, or 0 for auto.
    opts.nthreads:   number of threads, or 0: use all available (default)
  Outputs:
    f     length-nk complex vector of values at targets, or, if ntrans>1,
          a matrix of size (nk,ntrans)

Notes:
 * The vectorized (many vector) interface, ie ntrans>1, can be much faster
   than repeated calls with the same nonuniform points. Note that here the I/O
   data ordering is stacked rather than interleaved. See ../docs/matlab.rst
 * The class of input x (double vs single) controls whether the double or
   single precision library are called; precisions of all data should match.
 * For more details about the opts fields, see ../docs/opts.rst
 * See ERRHANDLER, VALID_* and FINUFFT_PLAN for possible warning/error IDs.
 * Full documentation is online at http://finufft.readthedocs.io
FINUFFT3D1   3D complex nonuniform FFT of type 1 (nonuniform to uniform).

f = finufft3d1(x,y,z,c,isign,eps,ms,mt,mu)
f = finufft3d1(x,y,z,c,isign,eps,ms,mt,mu,opts)

This computes, to relative precision eps, via a fast algorithm:

                      nj
    f[k1,k2,k3] =    SUM  c[j] exp(+-i (k1 x[j] + k2 y[j] + k3 z[j]))
                     j=1

    for -ms/2 <= k1 <= (ms-1)/2,  -mt/2 <= k2 <= (mt-1)/2,
        -mu/2 <= k3 <= (mu-1)/2.

  Inputs:
    x,y,z  coordinates of nonuniform sources on the cube [-3pi,3pi)^3,
           each a length-nj vector
    c     length-nj complex vector of source strengths. If numel(c)>nj,
          expects a stack of vectors (eg, a nj*ntrans matrix) each of which is
          transformed with the same source locations.
    isign if >=0, uses + sign in exponential, otherwise - sign.
    eps   relative precision requested (generally between 1e-15 and 1e-1)
    ms,mt,mu  number of Fourier modes requested in x,y and z; each may be
          even or odd.
          In either case the mode range is integers lying in [-m/2, (m-1)/2]
    opts   optional struct with optional fields controlling the following:
    opts.debug:   0 (silent, default), 1 (timing breakdown), 2 (debug info).
    opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots)
    opts.spread_sort:  0 (don't sort NU pts), 1 (do), 2 (auto, default)
    opts.spread_kerevalmeth:  0: exp(sqrt()), 1: Horner ppval (faster)
    opts.spread_kerpad: (iff kerevalmeth=0)  0: don't pad to mult of 4, 1: do
    opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc
    opts.upsampfac:   sigma.  2.0 (default), or 1.25 (low RAM, smaller FFT)
    opts.spread_thread:   for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc
    opts.maxbatchsize:  for ntrans>1 only. max blocking size, or 0 for auto.
    opts.nthreads:   number of threads, or 0: use all available (default)
    opts.modeord: 0 (CMCL increasing mode ordering, default), 1 (FFT ordering)
    opts.chkbnds: 0 (don't check NU points valid), 1 (do, default)
  Outputs:
    f     size (ms,mt,mu) complex array of Fourier coefficients
          (ordering given by opts.modeord in each dimension; ms fastest, mu
          slowest), or, if ntrans>1, a 4D array of size (ms,mt,mu,ntrans).

Notes:
 * The vectorized (many vector) interface, ie ntrans>1, can be much faster
   than repeated calls with the same nonuniform points. Note that here the I/O
   data ordering is stacked rather than interleaved. See ../docs/matlab.rst
 * The class of input x (double vs single) controls whether the double or
   single precision library are called; precisions of all data should match.
 * For more details about the opts fields, see ../docs/opts.rst
 * See ERRHANDLER, VALID_* and FINUFFT_PLAN for possible warning/error IDs.
 * Full documentation is online at http://finufft.readthedocs.io
FINUFFT3D2   3D complex nonuniform FFT of type 2 (uniform to nonuniform).

c = finufft3d2(x,y,z,isign,eps,f)
c = finufft3d2(x,y,z,isign,eps,f,opts)

This computes, to relative precision eps, via a fast algorithm:

   c[j] =   SUM   f[k1,k2,k3] exp(+/-i (k1 x[j] + k2 y[j] + k3 z[j]))
          k1,k2,k3
                           for j = 1,..,nj
    where sum is over -ms/2 <= k1 <= (ms-1)/2, -mt/2 <= k2 <= (mt-1)/2,
                      -mu/2 <= k3 <= (mu-1)/2.

 Inputs:
    x,y,z coordinates of nonuniform targets on the cube [-3pi,3pi)^3,
          each a vector of length nj
    f     complex Fourier coefficient array, whose size sets (ms,mt,mu).
          (Mode ordering given by opts.modeord, in each dimension.)
          If a 4D array, 4th dimension sets ntrans, and each of ntrans
          3D arrays is transformed with the same nonuniform targets.
    isign if >=0, uses + sign in exponential, otherwise - sign.
    eps   relative precision requested (generally between 1e-15 and 1e-1)
    opts   optional struct with optional fields controlling the following:
    opts.debug:   0 (silent, default), 1 (timing breakdown), 2 (debug info).
    opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots)
    opts.spread_sort:  0 (don't sort NU pts), 1 (do), 2 (auto, default)
    opts.spread_kerevalmeth:  0: exp(sqrt()), 1: Horner ppval (faster)
    opts.spread_kerpad: (iff kerevalmeth=0)  0: don't pad to mult of 4, 1: do
    opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc
    opts.upsampfac:   sigma.  2.0 (default), or 1.25 (low RAM, smaller FFT)
    opts.spread_thread:   for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc
    opts.maxbatchsize:  for ntrans>1 only. max blocking size, or 0 for auto.
    opts.nthreads:   number of threads, or 0: use all available (default)
    opts.modeord: 0 (CMCL increasing mode ordering, default), 1 (FFT ordering)
    opts.chkbnds: 0 (don't check NU points valid), 1 (do, default)
 Outputs:
    c     complex column vector of nj answers at targets, or,
          if ntrans>1, matrix of size (nj,ntrans).

Notes:
 * The vectorized (many vector) interface, ie ntrans>1, can be much faster
   than repeated calls with the same nonuniform points. Note that here the I/O
   data ordering is stacked rather than interleaved. See ../docs/matlab.rst
 * The class of input x (double vs single) controls whether the double or
   single precision library are called; precisions of all data should match.
 * For more details about the opts fields, see ../docs/opts.rst
 * See ERRHANDLER, VALID_* and FINUFFT_PLAN for possible warning/error IDs.
 * Full documentation is online at http://finufft.readthedocs.io
FINUFFT3D3   3D complex nonuniform FFT of type 3 (nonuniform to nonuniform).

f = finufft3d3(x,y,z,c,isign,eps,s,t,u)
f = finufft3d3(x,y,z,c,isign,eps,s,t,u,opts)

This computes, to relative precision eps, via a fast algorithm:

             nj
    f[k]  =  SUM   c[j] exp(+-i (s[k] x[j] + t[k] y[j] + u[k] z[j])),
             j=1
                             for k = 1, ..., nk
  Inputs:
    x,y,z  coordinates of nonuniform sources in R^3, each a length-nj vector.
    c     length-nj complex vector of source strengths. If numel(c)>nj,
          expects a stack of vectors (eg, a nj*ntrans matrix) each of which is
          transformed with the same source and target locations.
    isign if >=0, uses + sign in exponential, otherwise - sign.
    eps   relative precision requested (generally between 1e-15 and 1e-1)
    s,t,u  frequency coordinates of nonuniform targets in R^3,
          each a length-nk vector.
    opts   optional struct with optional fields controlling the following:
    opts.debug:   0 (silent, default), 1 (timing breakdown), 2 (debug info).
    opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots)
    opts.spread_sort:  0 (don't sort NU pts), 1 (do), 2 (auto, default)
    opts.spread_kerevalmeth:  0: exp(sqrt()), 1: Horner ppval (faster)
    opts.spread_kerpad: (iff kerevalmeth=0)  0: don't pad to mult of 4, 1: do
    opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc
    opts.upsampfac:   sigma.  2.0 (default), or 1.25 (low RAM, smaller FFT)
    opts.spread_thread:   for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc
    opts.maxbatchsize:  for ntrans>1 only. max blocking size, or 0 for auto.
    opts.nthreads:   number of threads, or 0: use all available (default)
  Outputs:
    f     length-nk complex vector of values at targets, or, if ntrans>1,
          a matrix of size (nk,ntrans)

Notes:
 * The vectorized (many vector) interface, ie ntrans>1, can be much faster
   than repeated calls with the same nonuniform points. Note that here the I/O
   data ordering is stacked rather than interleaved. See ../docs/matlab.rst
 * The class of input x (double vs single) controls whether the double or
   single precision library are called; precisions of all data should match.
 * For more details about the opts fields, see ../docs/opts.rst
 * See ERRHANDLER, VALID_* and FINUFFT_PLAN for possible warning/error IDs.
 * Full documentation is online at http://finufft.readthedocs.io
FINUFFT_PLAN   is a class which wraps the guru interface to FINUFFT.

 Full documentation is given in ../finufft-manual.pdf and online at
 http://finufft.readthedocs.io
 Also see examples in the matlab/examples and matlab/test directories.

PROPERTIES
  mwptr - opaque pointer to a C++ finufft_plan object (see MWrap manual),
          whose properties cannot be accessed directly
  floatprec - either 'double' or 'single', tracks what precision of C++
          library is being called
  type, dim, n_modes, n_trans, nj, nk - other plan parameters
 Note: the user should never alter these plan properties directly! Rather,
 the below methods should be used to create, use, and destroy plans.

METHODS
  finufft_plan - create guru plan object for one/many general nonuniform FFTs.
  setpts       - process nonuniform points for general FINUFFT transform(s).
  execute      - execute single or many-vector FINUFFT transforms in a plan.

General notes:
 * use delete(plan) to remove a plan after use.
 * See ERRHANDLER, VALID_*, and this code for warning/error IDs.



=========== Detailed description of guru methods ==========================

1) FINUFFT_PLAN create guru plan object for one/many general nonuniform FFTs.

plan = finufft_plan(type, n_modes_or_dim, isign, ntrans, eps)
plan = finufft_plan(type, n_modes_or_dim, isign, ntrans, eps, opts)

Creates a finufft_plan MATLAB object in the guru interface to FINUFFT, of
 type 1, 2 or 3, and with given numbers of Fourier modes (unless type 3).

Inputs:
    type            transform type: 1, 2, or 3
    n_modes_or_dim  if type is 1 or 2, the number of Fourier modes in each
                    dimension: [ms] in 1D, [ms mt] in 2D, or [ms mt mu] in 3D.
                    Its length sets the dimension, which must be 1, 2 or 3.
                    If type is 3, in contrast, its *value* fixes the dimension
    isign if >=0, uses + sign in exponential, otherwise - sign.
    eps   relative precision requested (generally between 1e-15 and 1e-1)
    opts   optional struct with optional fields controlling the following:
    opts.debug:   0 (silent, default), 1 (timing breakdown), 2 (debug info).
    opts.spread_debug: spreader: 0 (no text, default), 1 (some), or 2 (lots)
    opts.spread_sort:  0 (don't sort NU pts), 1 (do), 2 (auto, default)
    opts.spread_kerevalmeth:  0: exp(sqrt()), 1: Horner ppval (faster)
    opts.spread_kerpad: (iff kerevalmeth=0)  0: don't pad to mult of 4, 1: do
    opts.fftw: FFTW plan mode, 64=FFTW_ESTIMATE (default), 0=FFTW_MEASURE, etc
    opts.upsampfac:   sigma.  2.0 (default), or 1.25 (low RAM, smaller FFT)
    opts.spread_thread:   for ntrans>1 only. 0:auto, 1:seq multi, 2:par, etc
    opts.maxbatchsize:  for ntrans>1 only. max blocking size, or 0 for auto.
    opts.nthreads:   number of threads, or 0: use all available (default)
    opts.floatprec: library precision to use, 'double' (default) or 'single'.
    for type 1 and 2 only, the following opts fields are also relevant:
    opts.modeord: 0 (CMCL increasing mode ordering, default), 1 (FFT ordering)
    opts.chkbnds: 0 (don't check NU points valid), 1 (do, default)
Outputs:
    plan            finufft_plan object (opaque pointer)

Notes:
 * For type 1 and 2, this does the FFTW planning and kernel-FT precomputation.
 * For type 3, this does very little, since the FFT sizes are not yet known.
 * Be default all threads are planned; control how many with opts.nthreads.
 * The vectorized (many vector) plan, ie ntrans>1, can be much faster
   than repeated calls with the same nonuniform points. Note that here the I/O
   data ordering is stacked rather than interleaved. See ../docs/matlab.rst
 * For more details about the opts fields, see ../docs/opts.rst


2) SETPTS   process nonuniform points for general FINUFFT transform(s).

plan.setpts(xj)
plan.setpts(xj, yj)
plan.setpts(xj, yj, zj)
plan.setpts(xj, [], [], s)
plan.setpts(xj, yj, [], s, t)
plan.setpts(xj, yj, zj, s, t, u)

 When plan is a finufft_plan MATLAB object, brings in nonuniform point
 coordinates (xj,yj,zj), and additionally in the type 3 case, nonuniform
 frequency target points (s,t,u). Empty arrays may be passed in the case of
 unused dimensions. For all types, sorting is done to internally store a
 reindexing of points, and for type 3 the spreading and FFTs are planned.
 The nonuniform points may be used for multiple transforms.

Inputs:
    xj     vector of x-coords of all nonuniform points
    yj     empty (if dim<2), or vector of y-coords of all nonuniform points
    zj     empty (if dim<3), or vector of z-coords of all nonuniform points
    s      vector of x-coords of all nonuniform frequency targets
    t      empty (if dim<2), or vector of y-coords of all frequency targets
    u      empty (if dim<3), or vector of z-coords of all frequency targets
Input/Outputs:
    plan   finufft_plan object

Notes:
 * For type 1 and 2, the values in xj (and if nonempty, yj and zj) must
   lie in the interval [-3pi,3pi). For type 1 they are "sources", but for
   type 2, "targets". In contrast, for type 3 there are no restrictions other
   than the resulting size of the internal fine grids.
 * s (and t and u) are only relevant for type 3, and may be omitted otherwise
 * The matlab vectors xj,... and s,... should not be changed before calling
   future execute calls, because the plan stores only pointers to the
   arrays (they are not duplicated internally).
 * The precision (double/single) of all inputs must match that chosen at the
   plan stage using opts.floatprec, otherwise an error is raised.


3) EXECUTE   execute single or many-vector FINUFFT transforms in a plan.

result = plan.execute(data_in);

 For plan a previously created finufft_plan object also containing all
 needed nonuniform point coordinates, do a single (or if ntrans>1 in the
 plan stage, multiple) NUFFT transform(s), with the strengths or Fourier
 coefficient inputs vector(s) from data_in. The result of the transform(s)
 is returned as a (possibly multidimensional) array.

Inputs:
    plan     finufft_plan object
    data_in  strengths (types 1 or 3) or Fourier coefficients (type 2)
             vector, matrix, or array of appropriate size. For type 1 and 3,
             this is either a length-M vector (where M is the length of xj),
             or an (M,ntrans) matrix when ntrans>1. For type 2, in 1D this is
             length-ms, in 2D size (ms,mt), or in 3D size (ms,mt,mu), or
             each of these with an extra last dimension ntrans if ntrans>1.
Outputs:
    result   vector of output strengths at targets (types 2 or 3), or array
             of Fourier coefficients (type 1), or, if ntrans>1, a stack of
             such vectors or arrays, of appropriate size.
             Specifically, if ntrans=1, for type 1, in 1D
             this is a length-ms column vector, in 2D a matrix of size
             (ms,mt), or in 3D an array of size (ms,mt,mu); for types 2 and 3
             it is a column vector of length M (the length of xj in type 2),
             or nk (the length of s in type 3). If ntrans>1 its is a stack
             of such objects, ie, it has an extra last dimension ntrans.

Notes:
 * The precision (double/single) of all inputs must match that chosen at the
   plan stage using opts.floatprec, otherwise an error is raised.


4) To deallocate (delete) a nonuniform FFT plan, use delete(plan)

This deallocates all stored FFTW plans, nonuniform point sorting arrays,
 kernel Fourier transforms arrays, etc.