Right now the output JSON file is structured in a way such that when read by python using the json module it is converted to an array of dictionaries where each directionary contains a key "ID" and the corresponding ID is the value, plus additonal key such as dvars containing the design variables.
For finding an ID it would in my eyes be easier if the file was structured such that the resulting array contains only one dictionary, where the keys are something like ID+"dvars" where ID is the actual ID, since this allows to easily access elements.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
So in the proposal, the array in samples is not needed anymore?
But in the proposal with more categories dvar and objvar not all information of a single ID is together? How about using the ID as key directly and using an additional dictionary level?
Not really, since right now samples is an array of dictionaries.
Also if I understood correctly, @huber_y proposes the change since he would like to find a specific ID, which is complicated in the current scheme. He is right that ID number should be a key rather than a value.
The problem I have with splitting over several tags is that regathering all pieces of a sample needs
pre-knowledge of all possible tags.
splitting the ID (1) from the tag (dvar1) and accessing the new tags (objvar1)
This will make post-processing somewhat artificial and needlessly complicated.
The previous parsing followed some plotting scripts, if I remember correctly, that were dependent on that data structure. Those are now anyway deprecated thanks to pyOPALTools.
I think what @snuverink_j suggests is better than my suggestion.
We would then get a dictionary of dictionaries which allows to easily access IDs, which are keys on the outer level, and then the value corresponds (almost) to what are now elements of an array.
I don't really see the problem of "pre-knowledge of all tags" since you can get all keys in a dictionary as a list
I also added the git revision. I will now also do the same for the optimizer. However, I will replace the JSON format done by hand with boost::property_tree as in case of the sampler.
I've updated the optimizer as well. I changed the tag solutions where all individuals are stored to population because I think it's a better name for it. I opened an issue also for pyOPALTools#17 (closed) such that we fix the parsers. With the version tag we will then support both structures.
(I also put this over in pyOPALTools#17 (closed); it seemed like it should be there but I wasn't sure).
I just tried to read a JSON file in the new format and a new pull of pyOPALTools using the mldb.py script, and I get the following error:
File "/Users/auralee/ML-PSI/pyOPALTools/db/mldb.py", line 156, in build optjson = jsonreader.OptPilotJsonReader(path + '/') File "/Users/auralee/ML-PSI/pyOPALTools/optPilot/OptPilotJsonReader.py", line 229, in __init__ self.__buildNameToColumnMap(self.__directory + testfile) File "/Users/auralee/ML-PSI/pyOPALTools/optPilot/OptPilotJsonReader.py", line 533, in __buildNameToColumnMap for idx, name in enumerate(data["solutions"][0].keys()): KeyError: 'solutions'
It looks like it is still looking for 'solutions' rather than 'population' and not handling the change in how ID is handled. What's the best way to fix this?
I think I've fixed this by rewriting the last two functions in the OptPilotJsonReader.py. (Based on the earlier fix over at pyOPALTools#17 (closed) it looks like there's possibly a better way to do this using the opal.parser.OptimizerParser instead of what I did here, but I wasn't sure).
This still does not handle the old format, just the new format, so a check for the OPAL version would still have to go in there (or just change mldb.py to use the opal parser?)
and I've also just put the new and old code here:
New:
def __buildNameToColumnMap(self, filename): """ Build data structures Parameters ---------- filename : a generation file Returns ------- None Notes ----- Assumes the ordering: [u'dvar', u'obj', u'ID'] Examples -------- None """ if not os.path.isfile(filename): raise IOError("File '" + filename + "' does not exist.") data = json.load(open(filename)) # check validation if "dvar-bounds" not in data: raise KeyError("Error in JSON format: " \ "Design variable bounds are not present.") if "constraints" not in data: raise KeyError("Error in JSON format: " \ "Constraints are not present.") population = data["population"] first_indv_key = list(population.keys())[0] for idx, name in enumerate(data["population"][first_indv_key].keys()): name = name.replace(" ", "") #self.__nameToColumnMap[name] = idx if name == 'dvar': for jdx, dvars in enumerate(sorted(data["population"][first_indv_key][name].keys())): self.__dvarNameToColumnMap[dvars] = jdx elif name == 'obj': for jdx, objs in enumerate(sorted(data["population"][first_indv_key][name].keys())): self.__objNameToColumnMap[objs] = jdx else: raise KeyError("Error in JSON format: " \ "Unexpected keys in population.") self.__nDvars = len(self.__dvarNameToColumnMap) self.__nObjs = len(self.__objNameToColumnMap) self.__nColumns = self.__nDvars + self.__nObjs + 1 # + ID ## def __readJSONData(self, filename): """ Read in data Parameters ---------- filename : a generation file Returns ------- None Notes ----- Storage of columns: dvars, objs, ID Examples -------- None """ data = json.load(open(filename)) self.__dvarBounds = data["dvar-bounds"] self.__constraints = data["constraints"] population = data["population"] self.__nIndividuals = len(data["population"]) table = np.zeros((self.__nIndividuals, self.__nColumns)) for i, pop in enumerate(population): for j, key in enumerate(population[str(pop)]): if key == 'dvar': k = 0 for dvar, value in sorted(population[str(pop)][key].items()): table[i, k] = float(value) k += 1 elif key == 'obj': k = self.__nDvars for obj, value in sorted(population[str(pop)][key].items()): #print(obj,value) table[i, k] = float(value) k += 1 else: pass k = self.__nDvars + self.__nObjs table[i, k] = int(pop) return table
Original:
def __buildNameToColumnMap(self, filename): """ Build data structures Parameters ---------- filename : a generation file Returns ------- None Notes ----- Assumes the ordering: [u'dvar', u'obj', u'ID'] Examples -------- None """ if not os.path.isfile(filename): raise IOError("File '" + filename + "' does not exist.") data = json.load(open(filename)) # check validation if "dvar-bounds" not in data: raise KeyError("Error in JSON format: " \ "Design variable bounds are not present.") if "constraints" not in data: raise KeyError("Error in JSON format: " \ "Constraints are not present.") for idx, name in enumerate(data["solutions"][0].keys()): name = name.replace(" ", "") #self.__nameToColumnMap[name] = idx if name == 'dvar': for jdx, dvars in enumerate(sorted(data["solutions"][0][name].keys())): self.__dvarNameToColumnMap[dvars] = jdx elif name == 'obj': for jdx, objs in enumerate(sorted(data["solutions"][0][name].keys())): self.__objNameToColumnMap[objs] = jdx elif name == 'ID': pass else: raise RuntimeError("Not expected json format.") self.__nDvars = len(self.__dvarNameToColumnMap) self.__nObjs = len(self.__objNameToColumnMap) self.__nColumns = self.__nDvars + self.__nObjs + 1 # + ID ## def __readJSONData(self, filename): """ Read in data Parameters ---------- filename : a generation file Returns ------- None Notes ----- Storage of columns: dvars, objs, ID Examples -------- None """ data = json.load(open(filename)) self.__dvarBounds = data["dvar-bounds"] self.__constraints = data["constraints"] solutions = data["solutions"] self.__nIndividuals = len(data["solutions"]) table = np.zeros((self.__nIndividuals, self.__nColumns)) for i, solution in enumerate(solutions): for j, key in enumerate(solution): if key == 'dvar': k = 0 for dvar, value in sorted(solution[key].items()): table[i, k] = float(value) k += 1 elif key == 'obj': k = self.__nDvars for obj, value in sorted(solution[key].items()): table[i, k] = float(value) k += 1 elif key == 'ID': k = self.__nDvars + self.__nObjs table[i, k] = int(solution[key]) return table
Yes, I fixed the reading in opal.parser.OptimizerParser. I was not aware that OptPilotJsonReader was still in use because I thought it was only used in the deprecated plotting scripts. I wanted to remove OptPilotJsonReader.