Refactor FFTs

Prior profiling has shown that memory allocations in CUDA are slow. The memory overhead was significantly reduced in !97 (merged), but the FFTs continue to create temporary views for heFFTe. These should instead be converted to member variables such that the memory can be reused. Only the dimensions are relevant, so the same view could be reused even for different input fields, as long as the dimensions match. This would drastically reduce time lost to cudaFree calls (on the order of 10ms per FFT transform).

There is a lot of repeated code across the FFT specializations. They should inherit from a common base class (templated on the FFT backend) to drastically reduce code duplication. The sine and cosine transformations share even more code and can be unified further than the other transformations.

The transformation directions should be replaced with enum constants to statically enforce their validity and provide semantic clarity.

Reuse temporary fields
Reduce code duplication
- Common base functions for all FFTs
- ~~Common base class for sine and cosine transformations~~
- Introduce additional utility functions in the base class (introduced in !192 (merged)) for copying to temporary fields and performing the transformation
Replace ±1 literals for FFT transforms with enums
Reuse FFT object in solvers where possible

Edited Aug 28, 2023 by vinciguerra_a