Structure of output JSON file

Could you please add a small example of your suggested JSON structure? I do not really get your idea of ID+"dvars".

Instead of: { "name": "sampler", "dvar-bounds": { "G0": "[ 60, 75 ]", "G1": "[ 15, 25 ]", "K0": "[ 400, 550 ]", "K1": "[ 180, 280 ]", "PH0": "[ -10, 0 ]", "PH1": "[ -10, 0 ]" }, "samples": [ { "ID": "0", "dvar": { "G0": "70.456157764420865", "G1": "22.343850048882967", "K0": "415.54561556540602", "K1": "222.36191206646618", "PH0": "-8.3428688739554335", "PH1": "-3.8971476137559433" } }, { "ID": "1", "dvar": { "G0": "60.388879190177121", "G1": "20.548694190434421", "K0": "456.6794862960927", "K1": "212.62732376255298", "PH0": "-2.2657414384656072", "PH1": "-5.958981526584946" } }, etc.

which, only regarding the samples part, gives an array of dictionaries put something like this: { "name": "sampler", "dvar-bounds": { "G0": "[ 60, 75 ]", "G1": "[ 15, 25 ]", "K0": "[ 400, 550 ]", "K1": "[ 180, 280 ]", "PH0": "[ -10, 0 ]", "PH1": "[ -10, 0 ]" }, "samples": [ { "dvar1": { "G0": "70.456157764420865", "G1": "22.343850048882967", "K0": "415.54561556540602", "K1": "222.36191206646618", "PH0": "-8.3428688739554335", "PH1": "-3.8971476137559433" }, "dvar2": { "G0": "60.388879190177121", "G1": "20.548694190434421", "K0": "456.6794862960927", "K1": "212.62732376255298", "PH0": "-2.2657414384656072", "PH1": "-5.958981526584946" }, etc. } which would give an array with a single dict, where the ID is part of the key an thus easily accessible there might be more categories than dvar only, i.e. something like: { "name": "sampler", "dvar-bounds": { "G0": "[ 60, 75 ]", "G1": "[ 15, 25 ]", "K0": "[ 400, 550 ]", "K1": "[ 180, 280 ]", "PH0": "[ -10, 0 ]", "PH1": "[ -10, 0 ]" }, "samples": [ { "dvar1": { "G0": "70.456157764420865", "G1": "22.343850048882967", "K0": "415.54561556540602", "K1": "222.36191206646618", "PH0": "-8.3428688739554335", "PH1": "-3.8971476137559433" }, "objvar1": { "RMSX": "9.190177121", "RMSY": "6.94190434421", "RMSS": "7.94862960927" }, etc. }

I hope that clarifys what I meant

Yes, that clarifies. Thanks.

I suggest that we also add a version tag to the JSON, i.e. to tell the parser about the structure, otherwise we break data of previous runs.

added Enhancement ~115 Optimiser labels

added Sampler label

So in the proposal, the array in samples is not needed anymore?

But in the proposal with more categories dvar and objvar not all information of a single ID is together? How about using the ID as key directly and using an additional dictionary level?

{
    "name": "sampler",
    "dvar-bounds": {
        "G0": "[ 60, 75 ]",
        "G1": "[ 15, 25 ]",
        "K0": "[ 400, 550 ]",
        "K1": "[ 180, 280 ]",
        "PH0": "[ -10, 0 ]",
        "PH1": "[ -10, 0 ]"
    },
    "samples": {
        "1": {
            "dvar" : {
                "G0": "70.456157764420865",
                "G1": "22.343850048882967",
                "K0": "415.54561556540602",
                "K1": "222.36191206646618",
                "PH0": "-8.3428688739554335",
                "PH1": "-3.8971476137559433"
            },
            "objvar" : {
                "RMSX": "9.190177121",
                "RMSY": "6.94190434421",
                "RMSS": "7.94862960927"
            } , etc.
        } ,
        "3" : {
        }
     }
}

The tags dvar and objvar would still be together in the JSON file.

So, you would go back to a dictionary of dictionaries.

Not really, since right now samples is an array of dictionaries.

Also if I understood correctly, @huber_y proposes the change since he would like to find a specific ID, which is complicated in the current scheme. He is right that ID number should be a key rather than a value.

The problem I have with splitting over several tags is that regathering all pieces of a sample needs

pre-knowledge of all possible tags.
splitting the ID (1) from the tag (dvar1) and accessing the new tags (objvar1)

This will make post-processing somewhat artificial and needlessly complicated.

I agree to the parsing issues. Yes, @huber_y wants to have a specific ID.

The parsed data can anyway be stored as we like. We could split those infos into 3 arrays.

The previous parsing followed some plotting scripts, if I remember correctly, that were dependent on that data structure. Those are now anyway deprecated thanks to pyOPALTools.

I think what @snuverink_j suggests is better than my suggestion. We would then get a dictionary of dictionaries which allows to easily access IDs, which are keys on the outer level, and then the value corresponds (almost) to what are now elements of an array.

I don't really see the problem of "pre-knowledge of all tags" since you can get all keys in a dictionary as a list

assigned to @frey_m

mentioned in issue pyOPALTools#17 (closed)

Ok. I have an implementation for the SAMPLER. It writes now however in different order. But this doesn't matter. It would look now like this

{
    "samples": {
        "1": {
            "dvar" : {
                "G0": "70.456157764420865",
                "G1": "22.343850048882967",
                "K0": "415.54561556540602",
                "K1": "222.36191206646618",
                "PH0": "-8.3428688739554335",
                "PH1": "-3.8971476137559433"
            },
            "objvar" : {
                "RMSX": "9.190177121",
                "RMSY": "6.94190434421",
                "RMSS": "7.94862960927"
            } , etc.
        } ,
        "3" : {
        }
     },
    "name": "sampler",
    "version": "1.1",
    "dvar-bounds": {
        "G0": "[ 60, 75 ]",
        "G1": "[ 15, 25 ]",
        "K0": "[ 400, 550 ]",
        "K1": "[ 180, 280 ]",
        "PH0": "[ -10, 0 ]",
        "PH1": "[ -10, 0 ]"
    }
}

mentioned in commit d196f211

for the version, should we maybe use the OPAL version?

Yes, I thought about it at lunch. I will fix it now.

mentioned in commit ffa5f743

I also added the git revision. I will now also do the same for the optimizer. However, I will replace the JSON format done by hand with boost::property_tree as in case of the sampler.

closed via commit 5d045b26

I've updated the optimizer as well. I changed the tag solutions where all individuals are stored to population because I think it's a better name for it. I opened an issue also for pyOPALTools#17 (closed) such that we fix the parsers. With the version tag we will then support both structures.

reopened

closed

reopened

(I also put this over in pyOPALTools#17 (closed); it seemed like it should be there but I wasn't sure).

I just tried to read a JSON file in the new format and a new pull of pyOPALTools using the mldb.py script, and I get the following error:

 File "/Users/auralee/ML-PSI/pyOPALTools/db/mldb.py", line 156, in build
     optjson = jsonreader.OptPilotJsonReader(path + '/')

  File "/Users/auralee/ML-PSI/pyOPALTools/optPilot/OptPilotJsonReader.py", line 229, in __init__
    self.__buildNameToColumnMap(self.__directory + testfile)

   File "/Users/auralee/ML-PSI/pyOPALTools/optPilot/OptPilotJsonReader.py", line 533, in __buildNameToColumnMap
     for idx, name in enumerate(data["solutions"][0].keys()):

 KeyError: 'solutions'

It looks like it is still looking for 'solutions' rather than 'population' and not handling the change in how ID is handled. What's the best way to fix this?

I think I've fixed this by rewriting the last two functions in the OptPilotJsonReader.py. (Based on the earlier fix over at pyOPALTools#17 (closed) it looks like there's possibly a better way to do this using the opal.parser.OptimizerParser instead of what I did here, but I wasn't sure).

See the proposed changes here: pyOPALTools@eabf72cc

This still does not handle the old format, just the new format, so a check for the OPAL version would still have to go in there (or just change mldb.py to use the opal parser?)

and I've also just put the new and old code here:

New:

def __buildNameToColumnMap(self, filename):
        """ Build data structures

        Parameters
        ----------
        filename    : a generation file
    
        Returns
        -------
        None
    
        Notes
        -----
        Assumes the ordering: [u'dvar', u'obj', u'ID']
    
        Examples
        --------
        None
        """
        
        if not os.path.isfile(filename):
            raise IOError("File '" + filename + "' does not exist.")
        
        data = json.load(open(filename))
        
        # check validation
        if "dvar-bounds" not in data:
            raise KeyError("Error in JSON format: " \
                           "Design variable bounds are not present.")
        
        if "constraints" not in data:
            raise KeyError("Error in JSON format: " \
                           "Constraints are not present.")
            

        population = data["population"]
        first_indv_key = list(population.keys())[0]
        for idx, name in enumerate(data["population"][first_indv_key].keys()):
            name = name.replace(" ", "")
            #self.__nameToColumnMap[name] = idx
            if name == 'dvar':
                for jdx, dvars in enumerate(sorted(data["population"][first_indv_key][name].keys())):
                    self.__dvarNameToColumnMap[dvars] = jdx
            elif name == 'obj':
                for jdx, objs in enumerate(sorted(data["population"][first_indv_key][name].keys())):
                    self.__objNameToColumnMap[objs] = jdx
            else:
                raise KeyError("Error in JSON format: " \
                           "Unexpected keys in population.")

        self.__nDvars       = len(self.__dvarNameToColumnMap)
        self.__nObjs        = len(self.__objNameToColumnMap)
        self.__nColumns     = self.__nDvars + self.__nObjs + 1 # + ID

    ##
    def __readJSONData(self, filename):
        """ Read in data

        Parameters
        ----------
        filename    : a generation file
    
        Returns
        -------
        None
    
        Notes
        -----
        Storage of columns: dvars, objs, ID
    
        Examples
        --------
        None
        """
        
        data      = json.load(open(filename))
        
        self.__dvarBounds = data["dvar-bounds"]
        self.__constraints = data["constraints"]
        population = data["population"]
        self.__nIndividuals = len(data["population"])
        
        table     = np.zeros((self.__nIndividuals, self.__nColumns))

        for i, pop in enumerate(population):
            for j, key in enumerate(population[str(pop)]):

                if key == 'dvar':
                    k = 0
                    for dvar, value in sorted(population[str(pop)][key].items()):
                        table[i, k] = float(value)
                        k += 1

                elif key == 'obj':
                    k = self.__nDvars
                    for obj, value in sorted(population[str(pop)][key].items()):
                        #print(obj,value)
                        table[i, k] = float(value)
                        k += 1
                else:
                    pass

            k = self.__nDvars + self.__nObjs
            table[i, k] = int(pop)

        return table

Original:

    def __buildNameToColumnMap(self, filename):
        """ Build data structures

        Parameters
        ----------
        filename    : a generation file
    
        Returns
        -------
        None
    
        Notes
        -----
        Assumes the ordering: [u'dvar', u'obj', u'ID']
    
        Examples
        --------
        None
        """
        
        if not os.path.isfile(filename):
            raise IOError("File '" + filename + "' does not exist.")
        
        data = json.load(open(filename))
        
        # check validation
        if "dvar-bounds" not in data:
            raise KeyError("Error in JSON format: " \
                           "Design variable bounds are not present.")
        
        if "constraints" not in data:
            raise KeyError("Error in JSON format: " \
                           "Constraints are not present.")
        
        for idx, name in enumerate(data["solutions"][0].keys()):
            name = name.replace(" ", "")
            #self.__nameToColumnMap[name] = idx
            if name == 'dvar':
                for jdx, dvars in enumerate(sorted(data["solutions"][0][name].keys())):
                    self.__dvarNameToColumnMap[dvars] = jdx
            elif name == 'obj':
                for jdx, objs in enumerate(sorted(data["solutions"][0][name].keys())):
                    self.__objNameToColumnMap[objs] = jdx
            elif name == 'ID':
                pass
            else:
                raise RuntimeError("Not expected json format.")
        
        self.__nDvars       = len(self.__dvarNameToColumnMap)
        self.__nObjs        = len(self.__objNameToColumnMap)
        self.__nColumns     = self.__nDvars + self.__nObjs + 1 # + ID
        
        
    ##
    def __readJSONData(self, filename):
        """ Read in data

        Parameters
        ----------
        filename    : a generation file
    
        Returns
        -------
        None
    
        Notes
        -----
        Storage of columns: dvars, objs, ID
    
        Examples
        --------
        None
        """
        
        data      = json.load(open(filename))
        
        self.__dvarBounds = data["dvar-bounds"]
        self.__constraints = data["constraints"]
        solutions = data["solutions"]
        self.__nIndividuals = len(data["solutions"])
        
        table     = np.zeros((self.__nIndividuals, self.__nColumns))
        
        for i, solution in enumerate(solutions):
            for j, key in enumerate(solution):
                if key == 'dvar':
                    k = 0
                    for dvar, value in sorted(solution[key].items()):
                        table[i, k] = float(value)
                        k += 1
                elif key == 'obj':
                    k = self.__nDvars
                    for obj, value in sorted(solution[key].items()):
                        table[i, k] = float(value)
                        k += 1
                elif key == 'ID':
                    k = self.__nDvars + self.__nObjs
                    table[i, k] = int(solution[key])
        
        return table

Yes, I fixed the reading in opal.parser.OptimizerParser. I was not aware that OptPilotJsonReader was still in use because I thought it was only used in the deprecated plotting scripts. I wanted to remove OptPilotJsonReader.

Fixed, see in pyOPALTools#17 (closed).

closed

Ok, thanks!

Structure of output JSON file

Designs

Child items ...

Activity

Admin message

Structure of output JSON file

Activity