The stat files should be stored into a single file such that all data from all runs can be evaluated in post processing. The data should be supplemented with the values of the design variables. File formats that come into considerations are HDF5, SDDS, a NoSQL database, zip/tar gunzip etc.
I guess that your estimates are the sizes for the raw uncompressed data? I just checked with a stat file with 100 MB raw. When compressed 20 MB (gunzip) resp. 16 MB (bunzip2). I guess this is still to big.
@gsell: can each core write independent from all others? And can they write to separate data structures?
@adelmann: I've already implemented the functionality to extract columns from the stat file of each run. Now it remains to write the data to file.
I do not like the idea to introduce another file format or database or whatever. If HDF5 can be used, then we should do it. @kraus: yes each core can write independently, @adelmann: performance should be sufficient with the right chunking and alignment.
Not sure whether H5hut fulfills the requirement. It might require some changes. But maybe this is a good opportunity to fix some design flaws in the H5hut format and design/develop a H5hut V2 format.
Basically I agree with @gsell however if there is a real need to add a new package then we have to do so. I think we can
use HDF5 (H5Hut with maybe small additional features).
Before fixing H5hut we should look into the new standard being developed by LBL and DESY. At ICAP, it seams that this is
project is alive. @gsell you are monitoring this.
I propose to find out first if we can use what we already have.
We need to store the following information: m-samples, each sample has k vectors of length n.
When the sampler starts, q-samples are written, each sample from p-way parallel OPAL simulation.
@adelmann we need to store more information: the design variables and their values with each sample. Furthermore the length of the arrays don't have to be exactly the same in all runs (e.g. due to early termination, difference in path length (chicane) or what ever reason).
i is the sum of number of design variables and the number of objectives and j is the number of columns from the stat file to be stored. Updated the page accordingly.
I guess (@adelmann knows the realistic scenarios better than I do) max 15 design variables and as many objectives, in general much less.
Interesting columns from the stat file will be s, time, rms_x, rms_y, rms_s, emit_x, emit_y, emit_z, rms_px, rms_py, rms_pz, energy, ...? . The length of the arrays depends on the path length, the size of the time step and on the option STATDUMPFREQ.
The maximum number of objectives can be much bigger than this: an expression can be a combination of any two or more columns and can be evaluated at any position. So the theoretical maximum number of objectives is infinit.
Yes I agree but for the forseen future and real projects at hand my estimate is somehow bound the reality.
For the AWA optimisation, we now run one day the OPTIMIZER and 1/2 day the post processing, i.e. using linux tools
to concatenate the results. So I would love to have a prototype and why not in H5hut. From this we can learn about performance
etc.
First of all it's you @adelmann who requested this feature. And second I already implemented a lot towards this. I still wait for @gsell to provide me with a proper file format in hdf5.