Optimize ALPINE functions

Address suggested optimizations from !157 (merged).

Make res a vector of bool.
Use the expression dev > loadbalancethreshold_m directly
Use std::any_of instead of the explicit for-loop
MPI gather over booleans

The same optimization can be made for PICnd.

In addition, the two reduction kernels for the energy in ChargedParticles can be merged by using two reducers. This requires a Kokkos fence, since reduction kernels are non-blocking when there are multiple reducers.

Edited May 03, 2023 by vinciguerra_a