Refactor FFTs
Prior profiling has shown that memory allocations in CUDA are slow. The memory overhead was significantly reduced in !97 (merged), but the FFTs continue to create temporary views for heFFTe. These should instead be converted to member variables such that the memory can be reused. Only the dimensions are relevant, so the same view could be reused even for different input fields, as long as the dimensions match. This would drastically reduce time lost to cudaFree
calls (on the order of 10ms per FFT transform).
There is a lot of repeated code across the FFT specializations. They should inherit from a common base class (templated on the FFT backend) to drastically reduce code duplication. The sine and cosine transformations share even more code and can be unified further than the other transformations.
The transformation directions should be replaced with enum constants to statically enforce their validity and provide semantic clarity.

Reuse temporary fields 
Reduce code duplication 
Common base functions for all FFTs 
Common base class for sine and cosine transformations 
Introduce additional utility functions in the base class (introduced in !192 (merged)) for copying to temporary fields and performing the transformation


Replace ±1 literals for FFT transforms with enums 
Reuse FFT object in solvers where possible