OPAL job failed due to attempt to free memory that is still in use by MPI
I tried to take 10k samples and obtained the following error message:
Ippl{0}> CommMPI: Initialization complete.
Ippl{0}> CommMPI: Parent process waiting for children ...
Ippl{0}> CommMPI: Child 1 ready.
Ippl{0}> CommMPI: Child 2 ready.
Ippl{0}> CommMPI: Child 3 ready.
Ippl{0}> CommMPI: Child 4 ready.
Ippl{0}> CommMPI: Child 5 ready.
Ippl{0}> CommMPI: Child 6 ready.
Ippl{0}> CommMPI: Child 7 ready.
Ippl{0}> CommMPI: Initialization complete.
[merlin-c-002:10963] Attempt to free memory that is still in use by an ongoing MPI communication (buffer 0x7f6a000, size 6098944). MPI job will now abort.
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[63717,1],42]
Exit code: 1
--------------------------------------------------------------------------
I think this might be a bug. The following configuration file was used:
OPTION, INFO=TRUE;
REAL Nsamples = 10000;
// Design variables
p1: DVAR, VARIABLE="p1", LOWERBOUND=-30., UPPERBOUND=0;
// Sampling methods
sp1: SAMPLING, VARIABLE="p1", RANDOM=true, TYPE="UNIFORM", SEED=122, N = Nsamples;
tdc1: DVAR, VARIABLE="PHASE", LOWERBOUND=0.4856, UPPERBOUND=0.7284;
// ---- OPTIMIZER SECTION -------
dv0: DVAR, VARIABLE="IBF", LOWERBOUND=400, UPPERBOUND=500;
dv1: DVAR, VARIABLE="IM", LOWERBOUND=250, UPPERBOUND=440;
dv2: DVAR, VARIABLE="GPHASE", LOWERBOUND=-30.0, UPPERBOUND=0.0;
dv3: DVAR, VARIABLE="FWHM", LOWERBOUND=1.5e-12, UPPERBOUND=10.0e-12;
//Quad values
dv4: DVAR, VARIABLE="KQ1", LOWERBOUND=-8.0, UPPERBOUND=8.0;
dv5: DVAR, VARIABLE="KQ2", LOWERBOUND=-8.0, UPPERBOUND=8.0;
dv6: DVAR, VARIABLE="KQ3", LOWERBOUND=-8.0, UPPERBOUND=8.0;
dv7: DVAR, VARIABLE="KQ4", LOWERBOUND=-8.0, UPPERBOUND=8.0;
stdc1: SAMPLING, VARIABLE="PHASE", TYPE="UNIFORM", SEED=329, N = Nsamples;
sdv0: SAMPLING, VARIABLE="IBF", RANDOM=true, TYPE="UNIFORM", SEED=5979, N = Nsamples;
sdv1: SAMPLING, VARIABLE="IM", RANDOM=true, TYPE="UNIFORM", SEED=2840, N = Nsamples;
sdv2: SAMPLING, VARIABLE="GPHASE", RANDOM=true, TYPE="UNIFORM", SEED=68921, N = Nsamples;
sdv3: SAMPLING, VARIABLE="FWHM", RANDOM=true, TYPE="UNIFORM", SEED=580972, N = Nsamples;
sdv4: SAMPLING, VARIABLE="KQ1", RANDOM=true, TYPE="UNIFORM", SEED=1169, N = Nsamples;
sdv5: SAMPLING, VARIABLE="KQ2", RANDOM=true, TYPE="UNIFORM", SEED=435831, N = Nsamples;
sdv6: SAMPLING, VARIABLE="KQ3", RANDOM=true, TYPE="UNIFORM", SEED=183246, N = Nsamples;
sdv7: SAMPLING, VARIABLE="KQ4", RANDOM=true, TYPE="UNIFORM", SEED=12548, N = Nsamples;
SAMPLE,
RASTER = false,
DVARS = {p1, tdc1, dv0, dv1, dv2, dv3, dv4, dv5, dv6, dv7},
SAMPLINGS = {sp1, stdc1, sdv0, sdv1, sdv2, sdv3, sdv4, sdv5, sdv6, sdv7},
INPUT = "awa.tmpl",
OUTPUT = "awa",
OUTDIR = "output_5k",
TEMPLATEDIR = "tmpl",
FIELDMAPDIR = "fieldmaps",
NUM_MASTERS = 1,
NUM_COWORKERS = 1;
QUIT;
Just write me an email if somebody is interested and needs more information.