Buffer Preallocation and other Optimizations
The buffer-factory
branch mainly improves performance by reducing the number of memory allocations performed during MPI communication, which is especially important for GPU performance. New tests are added and various other optimizations are also made.
Major changes:
- Added a globally accessible interface for requesting buffers to be used for MPI communication
- Overallocate memory for communication buffers the first time and reuse the same buffers to avoid reallocation calls
- Particle deletion re-implemented as a partitioning algorithm
- Field layouts and particle regions now use the same mesh 1
Optimizations:
- Kokkos
resize
calls replaced withrealloc
, which does not preserve old memory contents 2 - Unnecessary barriers and if-conditions removed
- Added a non-blocking receive function to the
Communicate
class but it is not currently used - Reduced host-space memory reallocations by allocating vectors of MPI requests just once
Structural/repository changes:
- Introduced plasma mini-apps
- Updated the
Solvers
module to point to the branch with the FFT solver - Redundant test programs deleted and replaced with equivalent unit tests where appropriate
Piz Daint:
- Various type changes introduced to ensure successful compilation on Piz Daint
Aesthetic changes:
- Introduced a new header file with type aliases
Possible additional changes to be made:
- Some more cleanup in FFT files and test programs
- Expand use of new type aliases
Also closes #79 (closed).
-
Introduces charge conservation errors in the Penning trap test. Fixed in periodic-bcs-for-scatter-gather branch.
↩ -
Except in
ParticleAttrib
where theunpack
call still resizes to preserve old particle data↩
Edited by vinciguerra_a