Code indexing in gitaly is broken and leads to code not being visible to the user. We work on the issue with highest priority.

Skip to content
Snippets Groups Projects
Commit de9c45c2 authored by florez_j's avatar florez_j
Browse files

Updated to cleared jupyter notebooks

parent ba49b168
No related branches found
No related tags found
No related merge requests found
%% Cell type:code id: tags:
``` python
import os
from nbutils import add_project_path_to_sys_path
# Add project root to sys.path
add_project_path_to_sys_path()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
try:
import src.hdf5_writer as hdf5_writer
import src.hdf5_ops as hdf5_ops
import src.hdf5_vis as h5vis
import src.napp_plotlib as napp
import visualization.hdf5_vis as h5vis
import visualization.napp_plotlib as napp
import utils.g5505_utils as utils
#import pipelines.metadata_revision as metadata_revision
print("Imports successful!")
except ImportError as e:
print(f"Import error: {e}")
```
%% Cell type:markdown id: tags:
Read the above specified input_file_path as a dataframe.
Since we know this file was created from a Thorsten Table's format, we can use h5lib.read_mtable_as_dataframe() to read it.
Then, we rename the 'name' column as 'filename', as this is the column's name use to idenfify files in subsequent functions.
Also, we augment the dataframe with a few categorical columns to be used as grouping variables when creating the hdf5 file's group hierarchy.
%% Cell type:code id: tags:
``` python
# Define input file directory
input_file_path = '../input_files/BeamTimeMetaData.h5'
output_dir_path = '../output_files'
if not os.path.exists(output_dir_path):
os.makedirs(output_dir_path)
# Read BeamTimeMetaData.h5, containing Thorsten's Matlab Table
input_data_df = hdf5_ops.read_mtable_as_dataframe(input_file_path)
# Preprocess Thorsten's input_data dataframe so that i can be used to create a newer .h5 file
# under certain grouping specificiations.
input_data_df = input_data_df.rename(columns = {'name':'filename'})
input_data_df = utils.augment_with_filenumber(input_data_df)
input_data_df = utils.augment_with_filetype(input_data_df)
input_data_df = utils.split_sample_col_into_sample_and_data_quality_cols(input_data_df)
input_data_df['lastModifiedDatestr'] = input_data_df['lastModifiedDatestr'].astype('datetime64[s]')
input_data_df.columns
```
%% Cell type:markdown id: tags:
We now create a hdf5 file with a 3-level group hierarchy based on the input_data and three grouping functions. Then
we visualize the group hierarchy of the created file as a treemap.
%% Cell type:code id: tags:
``` python
# Define grouping functions to be passed into create_hdf5_file function. These can also be set
# as strings refering to categorical columns in input_data_df.
test_grouping_funcs = True
if test_grouping_funcs:
group_by_sample = lambda x : utils.group_by_df_column(x,'sample')
group_by_type = lambda x : utils.group_by_df_column(x,'filetype')
group_by_filenumber = lambda x : utils.group_by_df_column(x,'filenumber')
else:
group_by_sample = 'sample'
group_by_type = 'filetype'
group_by_filenumber = 'filenumber'
import pandas as pd
import h5py
path_to_output_filename = os.path.normpath(os.path.join(output_dir_path, 'test.h5'))
grouping_by_vars = ['sample', 'filenumber']
path_to_output_filename = hdf5_writer.create_hdf5_file_from_dataframe(path_to_output_filename,
input_data_df,
grouping_by_vars
)
annotation_dict = {'Campaign name': 'SLS-Campaign-2023',
'Producers':'Thorsten, Luca, Zoe',
'Startdate': str(input_data_df['lastModifiedDatestr'].min()),
'Enddate': str(input_data_df['lastModifiedDatestr'].max())
}
dataOpsObj = hdf5_ops.HDF5DataOpsManager(path_to_output_filename)
dataOpsObj.load_file_obj()
# Annotate root folder with annotation_dict
dataOpsObj.append_metadata('/',annotation_dict)
dataOpsObj.unload_file_obj()
h5vis.display_group_hierarchy_on_a_treemap(path_to_output_filename)
```
......
This diff is collapsed.
source diff could not be displayed: it is too large. Options to address this: view the blob.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment