Performance change

This page compares the performance of the various FINUFFT releases and the latest commit on the master branch. The graphs illustrate the change in the average duration of each step: makeplan, setpts, and execute across all versions. Parameters represent different input variations that were mentioned in this Github discussion by FINUFFT users.

The results of this test are automatically generated by the Github Action. Library versions are always recompiled, with the cmake flags -DFINUFFT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release and tested in a single run. This ensures that all tests are optimized by the same compiler and are executed on the same CPU. However, the exact CPU and compiler version depend on the runner.

FINUFFT can use 2 backend libraries for the Fast Fourier Transform: FFTW and DUCC. Although plots for both backends are generated using the same parameters, they should not be compared directly, because a separate runner executes tests for each backend. The headers of the backend sections show a summary of the CPU characteristics and compiler flags for each runner. In addition to the backend, all plots are grouped by transform type and dimensionality.

The reported speedup label (e.g. 1.10x) means faster than the baseline by that factor — the baseline is the leftmost bar (oldest version, or master in PR comparisons).

Performance (FFTW backend)

CPU name: AMD EPYC 9V74 80-Core Processor.

Arch: X86_64.

Core count: 2.

ISA extensions present: 3dnowext, 3dnowprefetch, abm, adx, aes, aperfmperf, apic, arat, avx, avx2, bmi1, bmi2, clflush, clflushopt, clwb, clzero, cmov, cmp_legacy, constant_tsc, cpuid, cr8_legacy, cx16, cx8, de, decodeassists, erms, extd_apicid, f16c, flushbyasid, fma, fpu, fsgsbase, fsrm, fxsr, fxsr_opt, ht, hypervisor, invpcid, lahf_lm, lm, mca, mce, misalignsse, mmx, mmxext, movbe, msr, mtrr, nonstop_tsc, nopl, npt, nrip_save, nx, osvw, osxsave, pae, pat, pausefilter, pcid, pclmulqdq, pdpe1gb, pfthreshold, pge, pni, popcnt, pse, pse36, rdpid, rdpru, rdrand, rdrnd, rdseed, rdtscp, rep_good, sep, sha, sha_ni, smap, smep, sse, sse2, sse4_1, sse4_2, sse4a, ssse3, svm, syscall, topoext, tsc, tsc_known_freq, tsc_reliable, tsc_scale, umip, user_shstk, v_vmsave_vmload, vaes, vmcb_clean, vme, vmmcall, vpclmulqdq, xgetbv1, xsave, xsavec, xsaveerptr, xsaveopt, xsaves.

Compiler version: c++ (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0.

Compiler flags: -march=native.

1D Transforms

Type 1

Parameters: prec:f N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.002

pics/perftestci_03fddf32ebd6d1ad.png

Parameters: prec:d N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_1a97e74700df8edf.png

Type 2

Parameters: prec:f N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.002

pics/perftestci_32470af6745a29e6.png

Parameters: prec:d N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_eab5fef5bbe3e744.png

Type 3

Parameters: prec:f N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.002

pics/perftestci_0699064210be24b8.png

Parameters: prec:d N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_9db50339e1439054.png

2D Transforms

Type 1

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.0001

pics/perftestci_77ba4ad1680da735.png

Parameters: prec:d N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_33c98f79a3ac719a.png

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:0 M:10000000.0 tol:0.0001

pics/perftestci_67ffb025e2b53ea8.png

Type 2

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.0001

pics/perftestci_eb3a21326e672e21.png

Parameters: prec:d N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_da952c8b20d25f6d.png

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:0 M:10000000.0 tol:0.0001

pics/perftestci_5619385b568e0607.png

Type 3

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.0001

pics/perftestci_5c3bb7a151aa63c1.png

Parameters: prec:d N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_c8548db3f2e8e4bd.png

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:0 M:10000000.0 tol:0.0001

pics/perftestci_0f3c8c701daefe8f.png

3D Transforms

Type 1

Parameters: prec:d N1:192 N2:192 N3:128 ntransf:1 threads:0 M:10000000.0 tol:1e-07

pics/perftestci_104410173c9ad53d.png

Type 2

Parameters: prec:d N1:192 N2:192 N3:128 ntransf:1 threads:0 M:10000000.0 tol:1e-07

pics/perftestci_bbb3acaae6845214.png

Type 3

Parameters: prec:d N1:192 N2:192 N3:128 ntransf:1 threads:0 M:10000000.0 tol:1e-07

pics/perftestci_92ddccfd887044cd.png

Performance (DUCC backend)

CPU name: AMD EPYC 9V74 80-Core Processor.

Arch: X86_64.

Core count: 2.

ISA extensions present: 3dnowext, 3dnowprefetch, abm, adx, aes, aperfmperf, apic, arat, avx, avx2, bmi1, bmi2, clflush, clflushopt, clwb, clzero, cmov, cmp_legacy, constant_tsc, cpuid, cr8_legacy, cx16, cx8, de, decodeassists, erms, extd_apicid, f16c, flushbyasid, fma, fpu, fsgsbase, fsrm, fxsr, fxsr_opt, ht, hypervisor, invpcid, lahf_lm, lm, mca, mce, misalignsse, mmx, mmxext, movbe, msr, mtrr, nonstop_tsc, nopl, npt, nrip_save, nx, osvw, osxsave, pae, pat, pausefilter, pcid, pclmulqdq, pdpe1gb, pfthreshold, pge, pni, popcnt, pse, pse36, rdpid, rdpru, rdrand, rdrnd, rdseed, rdtscp, rep_good, sep, sha, sha_ni, smap, smep, sse, sse2, sse4_1, sse4_2, sse4a, ssse3, svm, syscall, topoext, tsc, tsc_known_freq, tsc_reliable, tsc_scale, umip, user_shstk, v_vmsave_vmload, vaes, vmcb_clean, vme, vmmcall, vpclmulqdq, xgetbv1, xsave, xsavec, xsaveerptr, xsaveopt, xsaves.

Compiler version: c++ (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0.

Compiler flags: -march=native.

1D Transforms

Type 1

Parameters: prec:f N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.002

pics/perftestci_aa552f73f4b4949b.png

Parameters: prec:d N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_8b7238c4c23685d2.png

Type 2

Parameters: prec:f N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.002

pics/perftestci_02faa6e342257b82.png

Parameters: prec:d N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_67000b762fce7ccd.png

Type 3

Parameters: prec:f N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.002

pics/perftestci_a731766cdd9c953d.png

Parameters: prec:d N1:10000.0 N2:1 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_7e8bc5fc24054781.png

2D Transforms

Type 1

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.0001

pics/perftestci_776af2ee5f010040.png

Parameters: prec:d N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_6f4b53fdcb30c836.png

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:0 M:10000000.0 tol:0.0001

pics/perftestci_ee718d995b33296f.png

Type 2

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.0001

pics/perftestci_be36c150e3dcc402.png

Parameters: prec:d N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_6956b981d30ffa82.png

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:0 M:10000000.0 tol:0.0001

pics/perftestci_7910725fdb67a523.png

Type 3

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:0.0001

pics/perftestci_e84a28c7935892f9.png

Parameters: prec:d N1:320 N2:320 N3:1 ntransf:1 threads:1 M:10000000.0 tol:1e-09

pics/perftestci_c6bc9ac2b5ee9235.png

Parameters: prec:f N1:320 N2:320 N3:1 ntransf:1 threads:0 M:10000000.0 tol:0.0001

pics/perftestci_22c37ffb4e10ccd9.png

3D Transforms

Type 1

Parameters: prec:d N1:192 N2:192 N3:128 ntransf:1 threads:0 M:10000000.0 tol:1e-07

pics/perftestci_a080599c040d31de.png

Type 2

Parameters: prec:d N1:192 N2:192 N3:128 ntransf:1 threads:0 M:10000000.0 tol:1e-07

pics/perftestci_8e1f3642ff048e42.png

Type 3

Parameters: prec:d N1:192 N2:192 N3:128 ntransf:1 threads:0 M:10000000.0 tol:1e-07

pics/perftestci_5d5bb2f2d062d45b.png