List of features / changes made / release notes, in reverse chronological order
V 2.2.0 (not yet released... planning April 2023):
* MERGE OF CUFINUFFT (GPU CODE) INTO FINUFFT SOURCE TREE:
- combined cmake build system via FINUFFT_USE_CUDA flag
- python wrapper for GPU code included
- GPU documentation (improving on cufinufft) added {install,c,python)_gpu.rst
- CI includes GPU smoke test
- common spread_opts.h header; other code not yet made common.
- the GPU interface will be broken to match finufft
- cufinufft repo will be obsoleted.
- coding leads on this: Robert Blackwell and Joakim Anden
* cmake build structure (thanks: Wenda Zhou, Marco Barbone, Libin Lu)
- Note: the plan is to make the makefiles/make.inc.* obsolete this year.
* Fortran90 example via a new FINUFFT fortran module, thanks Reinhard Neder.
* Better Python interface type checking (PR #237).
* made the C++ plan object (finufft_plan_s) private; only opaque pointer
remains public, as should be (PR #233). Allows plan to have C++ constructs.
* fixed single-thread (OMP=OFF) build which failed due to fftw_defs.h/defs.h
V 2.1.0 (6/10/22)
* BREAKING INTERFACE CHANGE: nufft_opts is now called finufft_opts.
This is needed for consistency and fixes a historical problem.
We have compile-time warning, and backwards-compatibility for now.
* Professionalized the public-facing interface:
- safe lib (.so, .a) symbols via hierarchical namespacing of private funcs
that do not already begin with finufft{f}, in finufft:: namespace.
This fixes, eg, clash with linking against cufinufft (their Issue #138).
- public headers (finufft.h) has all macro names safe (ie FINUFFT suffix).
Headers both public and private rationalized/simplified.
- private headers are in include/finufft/, so not exposed by -Iinclude
- spread_opts renamed finufft_spread_opts, since publicly exposed and name
must respect library naming.
* change nj and nk in plan to BIGINT (int64_t), new big2d2f perftest, fixing
Issue #215.
* PDF manual moved from local to readthedocs.io hosting, Issue #221.
* Py doc for dtype fixed, Issue #216.
* spreadinterp evaluate_kernel_vector uses single arith when FLT=single.
* spread_opts.h fix duplication for FLT=single/double by making FLT->double.
* examples/simulplans1d1 demos ability to to wield independent plans.
* sped up float32 1d type 3 by 20% by using float32 cos()... thanks Wenda Zhou.
V 2.0.4 (1/13/22)
* makefile now appends (not replaces by) environment {C,F,CXX}FLAGS (PR #199).
* fixed MATLAB Contents.m and guru help strings.
* fortran examples: avoided clash with keywords "type" and "null", and correct
creation of null ptr for default opts (issues #195-196, Jiri Kulda).
* various fixes to python wheels CI.
* various docs improvements.
* fixed modeord=1 failure for type 3 even though should never be used anyway
(issue #194).
* fixed spreadcheck NaN failure to detect bug introduced in 2.0.3 (9566511).
* Dan Fortunato found and fixed MATLAB setpts temporary array loss, issue #185.
V 2.0.3 (4/22/21)
* finufft (plan) now thread-safe via OMP lock (if nthr=1 and -DFFTW_PLAN_SAFE)
+ new example/threadsafe*.cpp demos. Needs FFTW>=3.3.6 (Issues #72 #180 #183)
* fixed bug in checkbounds that falsely reported NU pt as invalid if exactly 1
ULP below +pi, for certain N values only, egad! (Issue #181)
* GH workflows continuous integration (CI) in four setups (linux, osx*2, mingw)
* fixed memory leak in type 3.
* corrected C guru execute documentation.
V 2.0.2 (12/5/20)
* fixed spreader segfault in obscure use case: single-precision N1>1e7, where
rounding error is O(1) anyway. Now uses consistent int(ceil()) grid index.
* Improved large-thread scaling of type-1 (spreading) via transition from OMP
critical to atomic add_wrapped_subgrid() operations; thanks Rob Blackwell.
* Increased heuristic t1 spreader max_subproblem_size, faster in 2D, 3D, and
allowed this and the above atomic threshold to be controlled as nufft_opts.
* Removed MAX_USEFUL_NTHREADS from defs.h and all code, for simplicity, since
large thread number now scales better.
* multithreaded one-mode accuracy test in C++ tests, t1 & t3, for faster tests.
V 2.0.1 (10/6/20)
* python (under-the-hood) interfacing changed from pybind11 to cleaner ctypes.
* non-stochastic test/*.cpp routines, zeroing small chance of incorrect failure
* Windows compatible makefile
* mac OSX improved installation instructions and make.inc.*
V 2.0.0 (8/28/20)
* major changes to code, internally, and major improvements to operation and
language interfaces.
WARNING!: Here are all the interface compatibility changes from 1.1.2:
- opts (nufft_opts) is now always passed as a pointer in C++/C, not
pass-by-reference as in v1.1.2 or earlier.
- Fortran simple calls are now finufft?d?(..) not finufft?d?_f(..), and
they add a penultimate opts argument.
- Python module name is now finufft not finufftpy, and the interface has
been completely changed (allowing major improvements, see below).
- ier=1 is now a warning not an error; this indicates requested tol
was too small, but that a transform *was* done at the best possible
accuracy.
- opts.fftw directly controls the FFTW plan mode consistently in all
language interfaces (thus changing the meaning of fftw=0 in MATLAB).
- Octave now needs version >= 4.4, since OO features used by guru.
These changes were deemed necessary to rationalize and improve FINUFFT
for the long term.
There are also many other new interface options (many-vector, guru)
added; see docs.
* the C++ library is now dual-precision, with distinct function interfaces for
double vs single precision operation, that are C and C++ compatible. Under
the hood this is achieved via simple C macros. All language interfaces now
have dual precision options.
* completely new (although backward compatible) MATLAB/octave interface,
including object-style wrapper around the guru interface, dual precision.
* completely new Fortran interface, allowing >2^31 sized (int64) arrays,
all simple, many-vector and guru interface, with full options control,
and dual precisions.
* all simple and many-vector interfaces now call guru interface, for much
better maintainability and less code repetition.
* new guru interface, by Andrea Malleo and Alex Barnett, allowing easier
language wrapping and control of point-setting, reuse of sorting and FFTW
plans. This finally bypasses the 0.1ms/thread cost of FFTW looking up previous
wisdom, which slowed down performance for many small problems.
* removed obsolete -DNEED_EXTERN_C flag.
* major rewrite of documentation, plus tutorial application examples in MATLAB.
* numdiff dependency is removed for pass-fail library validation.
* new (professional!) logo for FINUFFT. Sphinx HTML and PDF aesthetics.
V 1.1.2 (1/31/20)
* Ludvig's padding of Horner loop to w=4n, speeds up kernel, esp for GCC5.4.
* Bansal's Mingw32 python patches.
V 1.1.1 (11/2/18)
* Mac OSX installation on clang and gcc-8, clearer install docs.
* LIBSOMP split off in makefile.
* printf(...%lld..) w/ long long typecast
* new basic passfail tester
* precompiled binaries
V 1.1 (9/24/18)
* NOTE TO USERS: changed interface for setting default opts in C++ and C, from
pass by reference to pass by value of a pointer (see docs/). Unifies C++/C
interfaces in a clean way.
* fftw3_omp instead of fftw3_threads (on linux), is faster.
* rationalized header files.
V 1.0.1 (9/14/18)
* Ludvig's removal of omp chunksize in dir=2, another 20%+ speedup.
* Matlab doesn't change omp internal state.
V 1.0 (8/20/18)
* repo transferred to flatironinstitute
* usage doc simpler
* 2d1many and 2d2many interfaces by Melody Shih, for multiple vectors with same
nonuniform points. All tests and docs for these interfaces.
* horner optimized kernel for sigma=5/4 (low upsampling), to go along with the
default sigma=2. Cmdline arg to change sigma in finufft?d_test.
* simplified various int types: only BIGINT remains.
* clearer docs.
* remaining C interfaces, with opts control.
V 0.99 (4/24/18)
* piecewise polynomial kernel evaluation by Horner, for faster spreading esp at
low accuracy and 1d or 2d.
* various heuristic decisions re whether to sort, and if sorting is single or
multi-threaded.
* single-precision libs get an "f" suffix so can coexist with double-prec.
V 0.98 (3/1/18)
* makefile includes make.inc for OS-specific defs.
* decided that, since Legendre nodes code of GLR alg by Hale/Burkhardt is LGPL
licensed, our use (not changing source) is not a "derived work", therefore
ok under our Apache v2 license. See:
https://tldrlegal.com/license/gnu-lesser-general-public-license-v3-(lgpl-3)
https://www.apache.org/licenses/GPL-compatibility.html
https://softwareengineering.stackexchange.com/questions/233571/
open-source-what-is-the-definition-of-derivative-work-and-how-does-it-impact
* fixed MATLAB FFTW incompat alloc crash, by hack of Joakim, calling fft()
first.
* python tests fixed, brought into makefile.
* brought in af Klinteberg spreader optimizations & SSE tricks.
* logo
V 0.97 (12/6/17)
* tidied all docs -> readthedocs.io host. README.md now a stub. TODO tidied.
* made sort=1 in tests for xeon (used to be 0)
* removed mcwrap and python dirs
* changed name of py routines to nufft* from finufft*
* python interfaces doc, up-to-date. Removed ms,.. from type-2 interfaces.
* removed RESCALEs from lower dims in bin_sort, speeds up a few % in 1D.
* allowed NU pts to be currectly folded from +-1 periods from central box, as
per David Stein request. Adds 5% to time at 1e-2 accuracy, less at higher acc.
* corrected dynamic C++ array allocs in spreader (some made static, 5% speedup)
* removed all C++11 dependencies, mainly that opts structs are all explicitly
initialized.
* fixed python interface to have chkbnds.
* tidied MEX interface
* removed memory leaks (!)
* opts.modeord implemented and exposed to matlab/python interfaces. Also removes
looping backwards in RAM in deconvolveshuffle.
V 0.96 (10/15/17)
* apache v2 license, exposed flags in python interface.
V 0.95 (10/2/17)
* brought in JFM's in-package python wrapper & doc, create lib-static dir,
removed devel dir.
V 0.9: (6/17/17)
* adapted adv4 into main code, inner loops separate by dim, kill
the current spreader. Incorporate old ideas such as: checkerboard
per-thread grid cuboids, compare speed in 2d and 3d against
current 1d slicing. See cnufftspread:set_thread_index_box()
* added FFTW_MEAS vs FFTW_EST (set as default) opts flag in nufft_opts, and
matlab/python interfaces
* removed opts.maxnalloc in favor of #defined MAX_NF
* fixed the 1-target case in type-3, all dims, to avoid nan; clarified logic
for handling X=0 and/or S=0. 6/12/17
* changed arraywidcen to snap to C=0 if relative shift is <0.1, avoids cexps in
type-3.
* t3: if C1 < X1/10 and D1 < S1/10 then don't rephase. Same for d=2,3.
* removed the 1/M type-1 prefactor, also in all test routines. 6/6/17
* removed timestamp-based make decision whether to rebuild matlab/finufft.cpp,
since git clone creates files with random timestamp order!
* theory work on exp(sqrt) being close to PSWF. Analysis.
* fix issue w/ needing mwrap when do make matlab.
* makefile has variables customizing openmp and precision, non-omp tested
* fortran single-prec demos (required all direct ft's in single prec too!)
* examples changed to err rel to max F.
* matlab interface control of opts.spread_sort.
* matlab interface using doubles as big ints w/ correct typecasting.
* twopispread removed, used flag in spread_opts for [-pi,pi] input instead.
* testfinufft* use same integer type INT as for interfaces, typecast all %ld in
printf warnings, use omp rand array filling
* INT64 for necessary size-setting arrays, removed all %lf printf warnings in
finufft*
* all internal array indexing is BIGINT, switchable from long long to int via
SMALLINT compile flag (default off, see utils.h)
* all integers in interfaces are type INT, default 64-bit, switchable to 32 bit
by INTERGER32 compile flag (see utils.h)
* test big probs (speed, crashing) and decide if BIGINT is long long or int?
slows any array access down, or spreading? allows I/O sizes
(M, N1*N2*N3) > 2^31. Note June-Yub int*8 in nufft-1.3.x slowed things by
factor 2-3.
* tidy up spreader to be BIGINT = long long compatible and test > 2^31.
* spreadtest parallel rand()
* sort flag passed to spreader via finufft, and test scripts check if Xeon
(-> sort=0)
* opts in the manual
* removed all xk2, dNU2 sorted arrays, and not-needed dims y,z; halved RAM usage
V 0.8: (3/27/17)
* bnderr checking done in dir=1,2 main loops, not before.
* all kx2, dNU2 arrays removed, just done by permutation index when needed.
* MAC OSX test, makefile, instructions.
* matlab wrappers in 3D
* matlab wrappers, mcwrap issue w/ openmp, mex, and subdirs. Ship mex
executables for linux. Link to .a
* matlab wrappers need ier output? yes, and internal omp numthreads control
(since matlab's is poor)
* wrappers for MEX octave, instructions. Ship .mex for octave.
* python wrappers - Dan Foreman-Mackey starting to add something similar to
https://github.com/dfm/python-nufft
* check is done before attempting to malloc ridiculous array sizes, eg if a
large product of x width and k width is requested in type 3 transforms.
* draft make python
* basic manual (txt)
V. 0.7:
* build static & shared lib
* fixed bug when Nth>Ntop
* fortran drivers use dynamic malloc to prevent stack segfaults that CMCL had
* bugs found in fortran drivers, removed
* split out devel text files (TODO, etc)
* made pass-fail test script counting crashes and numdiff fails.
* finufft?d_test have a no-timings option, and exit with ier.
* global error codes
* made finufft routines & testers return error codes rather than exit().
* dumbinput test executable
* found nan returned error for nj=0 in type-1, fixed so returns the zero array.
* fixed type 2 to not segfault when ms,mt, or mu=0, doing dir=2 0-padding right
* array utils use pointers to make which vars they write to explicit.
* don't do final type-3 rephase if C1 nan or 0.
* finished all dumbinputs, all dims
* fortran compilation fixed
* makefile self-documents
* nf1 (etc) size check before alloc, exit gracefully if exceeds RAM
* integrate into nufft_comparison, esp vs NFFT - jfm did
* simple examples, simpler than the test drivers
* fortran link via gfortran, better fortran docs
* boilerplate stuff as in CMCL page
pre-V. 0.7: (Jan-Feb 2017)
* efficient modulo in spreader, done by conditionals
* removed data-zeroing bug in t-II spreader, slowness of large arrays in t-I.
* clean dir tree
* spreader dir=1,2 math tests in 3d, then nd.
* Jeremy's request re only computing kernel vals needed (actually
was vital for efficiency in dir=1 openmp version), Ie fix KB kereval in
spreader so doesn't wdo 3d fill when 1 or 2 will do.
* spreader removed modulo altogether in favor of ifs
* OpenMP spreader, all dims
* multidim spreader test, command line args and bash driver
* cnufft->finufft names, except spreader still called cnufft
* make ier report accuracy out of range, malloc size errors, etc
* moved wrappers to own directories so the basic lib is clean
* fortran wrapper added ier argument
* types 1,2 in all dims, using 1d kernel for all dims.
* fix twopispread so doesn't create dummy ky,kz, and fix sort so doesn't ever
access unused ky,kz dims.
* cleaner spread and nufft test scripts
* build universal ndim Fourier coeff copiers in C and use for finufft
* makefile opts and compiler directives to link against FFTW.
* t-I, t-II convergence params test: R=M/N and KB params
* overall scale factor understand in KB
* check J's bessel10 approx is ok. - became irrelevant
* meas speed of I_0 for KB kernel eval - became irrelevant
* understand origin of dfftpack (netlib fftpack is real*4) - not needed
* [spreader: make compute_sort_indices sensible for 1d and 2d. not needed]
* next235even for nf's
* switched pre/post-amp correction from DFT of kernel to F series (FT) of
kernel, more accurate
* Gauss-Legendre quadrature for direct eval of kernel FT, openmp since cexp slow
* optimize q (# G-L nodes) for kernel FT eval on reg and irreg grids
(common.cpp). Needs q a bit bigger than like (2-3x the PTR, when 1.57x is
expected). Why?
* type 3 segfault in dumb case of nj=1 (SX product = 0). By keeping gam>1/S
* optimize that phi(z) kernel support is only +-(nspread-1)/2, so w/ prob 1 you
only use nspread-1 pts in the support. Could gain several % speed for same acc
* new simpler kernel entirely
* cleaned up set_nf calls and removed params from within core libs
* test isign=-1 works
* type 3 in 2d, 3d
* style: headers should only include other headers needed to compile the .h;
all other headers go in .cpp, even if that involves repetition I guess.
* changed library interface and twopispread to dcomplex
* fortran wrappers (rmdir greengard_work, merge needed into fortran)
Started: mid-January 2017.