|
|
Code for machine learning algorithms for Jungfraujoch Data sets.
|
|
|
|
|
|
Time series clustering is an unsupervised method for grouping data points based on their similarity. The goal is to maximize data similarity within clusters and minimize it across clusters.
|
|
|
1.) to clone the directory, copy the url and write in the command line:
|
|
|
|
|
|
git clone url
|
|
|
|
|
|
to clone the directory, copy the url and write in the command line:
|
|
|
2.) The dataset is located on Merlin: '/data/project/general/aerosolretriev/Jungfraujoch_data/Instrument Data/'.
|
|
|
|
|
|
git clone url |
|
|
\ No newline at end of file |
|
|
For the analysis we use the file:
|
|
|
'/data/project/general/aerosolretriev/Jungfraujoch_data/Instrument_Data/merged_data/aerosol_data_JFJ_2020.csv'
|
|
|
|
|
|
Here the data are already merged and it includes 8784 rows × 325 columns.
|
|
|
|
|
|
The diameters related to the size distribution are in the file:
|
|
|
'/data/project/general/aerosolretriev/Jungfraujoch_data/Instrument_Data/merged_data/midpoint_diameters_size_distr_JFJ_2020.csv'
|
|
|
|
|
|
3.) First run the jupyter notebook "Preprocess_Data_Set.ipynb"
|
|
|
|
|
|
The data are read, NaN and Inf are replaced by 0 (check!)
|
|
|
|
|
|
And the starting and end dates for the sagara dust are typed there. From the paper: https://acp.copernicus.org/articles/21/18029/2021/
|
|
|
Table A1. I rounded up the hours of the events (using e.g. 14:00 instead of 13:42) (check!)
|
|
|
|
|
|
Then I used this times to specify the dust events and added it to the pandas dataframe columns "sde_event", "sde_event_nr".
|
|
|
|
|
|
The final dataframe including the duste vents is stored at:
|
|
|
'/data/project/general/aerosolretriev/Jungfraujoch_data/data/aerosol_data.h5'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|