-
-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Similar to the FastDense
optimizations in SciML/DiffEqFlux.jl#671, this library can definitely benefit from having pre-cached versions of the operations since the neural networks are generally small. In addition, the plan_fft
part could be cached and reused for subsequent calls. Given the amount of reuse, direct control of the planning could be helpful:
The flags argument is a bitwise-or of FFTW planner flags, defaulting to FFTW.ESTIMATE. e.g. passing FFTW.MEASURE or FFTW.PATIENT will instead spend several seconds (or more) benchmarking different possible FFT algorithms and picking the fastest one; see the FFTW manual for more information on planner flags. The optional timelimit argument specifies a rough upper bound on the allowed planning time, in seconds. Passing FFTW.MEASURE or FFTW.PATIENT may cause the input array A to be overwritten with zeros during plan creation.
Note that the precaching only removes allocations in cases with a single forward before reverse. A separate pointer bumping method would be necessary to precache a whole batch of test inputs, if multiple batches are used in one loss equation.