Known Issues
-
Error: "Resource already allocated". This error appears when running on GPUs on Perlmutter, compiling with Cray mpi. The error only appears when we use more than 1 GPU/node (we can use up to 4 GPUs per node). For some reason, cuda allocates the "communication window" between GPUs multiple times (instead of once), causing the error. Furthermore, the error does not appear if we run with a large enough overallocation factor (such that Kokkos realloc is never called). The branch "perlmutter-reproducer" contains a simplified reproducer of this error (test/serialize/sendRecvIPPL.cpp). The workaround is to use openmpi to compile.
-
Error: "Failed to register memory". This error appears consistently when running on GPUs on Piz Daint, compiling with Cray mpi. Since IPPL is not the only code which is suffering this error on Piz Daint, a ticket has been opened with Cray (it is not an IPPL related issue). We are compiling with openmpi in order to circumvent this issue.
-
Error: "Bus error". This appears with the Intel compiler on Piz Daint, but it is not consistent. The error happens at the very beginning of the simulation, but is not fully understood.