Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
J Jungfraujoch_MachineLearning
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Metrics
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Package Registry
  • Analytics
    • Analytics
    • CI/CD
    • Code Review
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • boiger_r
  • Jungfraujoch_MachineLearning
  • Wiki
  • Home

Last edited by boiger_r Nov 24, 2022
Page history

Home

Code for machine learning algorithms for Jungfraujoch Data sets.

1.) to clone the directory, copy the url and write in the command line:

git clone url

2.) The dataset is located on Merlin: '/data/project/general/aerosolretriev/Jungfraujoch_data/Instrument Data/'.

For the analysis we use the file: '/data/project/general/aerosolretriev/Jungfraujoch_data/Instrument_Data/merged_data/aerosol_data_JFJ_2020.csv'

Here the data are already merged and it includes 8784 rows × 325 columns.

The diameters related to the size distribution are in the file: '/data/project/general/aerosolretriev/Jungfraujoch_data/Instrument_Data/merged_data/midpoint_diameters_size_distr_JFJ_2020.csv'

3.) First run the jupyter notebook "Preprocess_Data_Set.ipynb"

The data are read, NaN and Inf are replaced by 0 (check!)

And the starting and end dates for the sagara dust are typed there. From the paper: https://acp.copernicus.org/articles/21/18029/2021/ Table A1. I rounded up the hours of the events (using e.g. 14:00 instead of 13:42) (check!)

Then I used this times to specify the dust events and added it to the pandas dataframe columns "sde_event", "sde_event_nr".

The final dataframe including the duste vents is stored at: '/data/project/general/aerosolretriev/Jungfraujoch_data/data/aerosol_data.h5'

4.) In the Jupyter Notebook "Plot_data.ipynb" you find some basic plots. This notebook can be arbitrarily extended.

5.) In the Jupyter Notebooks "Function_collection_summary_univariate" "Function_collection_summary_multivariate" the two python scripts evaluation.py (includes reading the data and all metrics to see performance of the methods) and unsupervised_methods.py (all unsupervised methods are implemented there) are used. In the notebooks the different methods are tested and used for different datasets (different combinations of features) and the results are plotted.

6.) In the Jupyter Notebook "Martines_method.py" the latest data from Martine from Meteoswiss are read and the performance in terms of my measures is tested

Clone repository
  • Algorithms
  • Home