MLLIB issueshttps://gitlab.psi.ch/adelmann/mllib/-/issues2020-03-19T18:44:28+01:00https://gitlab.psi.ch/adelmann/mllib/-/issues/14Remove logic to transform only certain columns from the existing preprocessors2020-03-19T18:44:28+01:00bellotti_rRemove logic to transform only certain columns from the existing preprocessorsThis functionality is provided in general form by `SelectivePreprocessor`.
At least `LogarithmTransform` contains logic to do this. This is redundant and should be removed. Check also the other preprocessors.This functionality is provided in general form by `SelectivePreprocessor`.
At least `LogarithmTransform` contains logic to do this. This is redundant and should be removed. Check also the other preprocessors.bellotti_rbellotti_rhttps://gitlab.psi.ch/adelmann/mllib/-/issues/13Extend Surrogate example with X_val, y_val2020-01-27T17:48:16+01:00bellotti_rExtend Surrogate example with X_val, y_valhttps://gitlab.psi.ch/adelmann/mllib/-/issues/12Write tests to check the model predictions on simple datasets2020-01-27T13:17:51+01:00bellotti_rWrite tests to check the model predictions on simple datasetsThis ensures that the training/prediction callbacks of the surrogates are working fine.
Related: #11This ensures that the training/prediction callbacks of the surrogates are working fine.
Related: #11https://gitlab.psi.ch/adelmann/mllib/-/issues/11Feature request: Example datasets2020-01-27T13:39:00+01:00bellotti_rFeature request: Example datasetsThings to include:
- Datasets from sklearn
- [Datasets from tensorflow/keras](https://www.tensorflow.org/datasets); [list of datasets](https://www.tensorflow.org/datasets/catalog/overview)
- Datasets sampled "on-the-fly", for example th...Things to include:
- Datasets from sklearn
- [Datasets from tensorflow/keras](https://www.tensorflow.org/datasets); [list of datasets](https://www.tensorflow.org/datasets/catalog/overview)
- Datasets sampled "on-the-fly", for example the Gaussian mixture dataset from [this paper](https://arxiv.org/abs/1808.04730).
- Additional stuff*?
*Related: Google has a released a [dataset search engine](https://datasetsearch.research.google.com/), so feel free to make suggestions!
Might be very useful to validate surrogates, preprocessors etc.https://gitlab.psi.ch/adelmann/mllib/-/issues/10Think more about if inheriting from DataSource makes sense, and specifying se...2020-01-27T13:18:45+01:00bellotti_rThink more about if inheriting from DataSource makes sense, and specifying set_view() more closelyhttps://gitlab.psi.ch/adelmann/mllib/-/issues/9Decide which license to use2020-03-18T08:36:49+01:00bellotti_rDecide which license to useI'm in favor of using either [Apache 2.0](https://tldrlegal.com/license/apache-license-2.0-(apache-2.0)) or [GPLv3](https://tldrlegal.com/license/gnu-general-public-license-v3-(gpl-3)).
We should discuss this with Andreas, there might b...I'm in favor of using either [Apache 2.0](https://tldrlegal.com/license/apache-license-2.0-(apache-2.0)) or [GPLv3](https://tldrlegal.com/license/gnu-general-public-license-v3-(gpl-3)).
We should discuss this with Andreas, there might be regulations of PSI to be followed.bellotti_radelmannli_s1zacharias_mbellotti_rhttps://gitlab.psi.ch/adelmann/mllib/-/issues/8Including archiver client as a dependency2019-11-20T11:14:59+01:00bellotti_rIncluding archiver client as a dependencyI'm trying to remove our copy-paste dependency (the archiver client) by an external package.
The ```data_api_python``` package does not provide a PyPI package, but it is possible to add a git repo as a dependency, which would be automat...I'm trying to remove our copy-paste dependency (the archiver client) by an external package.
The ```data_api_python``` package does not provide a PyPI package, but it is possible to add a git repo as a dependency, which would be automatically downloaded, built and installen when one executes ```pip install ./mllib```.
It turns out that there is a very subtle and weird problem with their repo name... I've opened an [issue](https://github.com/paulscherrerinstitute/data_api_python/issues/6), in case anybody is interested in following this.bellotti_rbellotti_rhttps://gitlab.psi.ch/adelmann/mllib/-/issues/7Replace xlsx by OpenPyXL2019-11-19T20:02:00+01:00bellotti_rReplace xlsx by OpenPyXL[Quote by the author of xlsx](https://github.com/python-excel/xlrd):
```
This library currently has no active maintainers.
You are advised to use OpenPyXL instead.
If you absolutely have to read .xls files, then xlrd will probably still...[Quote by the author of xlsx](https://github.com/python-excel/xlrd):
```
This library currently has no active maintainers.
You are advised to use OpenPyXL instead.
If you absolutely have to read .xls files, then xlrd will probably still work for you, but please do not submit issues complaining that this library will not read your corrupted or non-standard file.
Just because Excel or some other piece of software opens your file does not mean it is a valid xls file.
```li_s1zacharias_mli_s1https://gitlab.psi.ch/adelmann/mllib/-/issues/6new subpackage for functionalities like TrainTestSplit2020-01-28T13:39:12+01:00zacharias_mnew subpackage for functionalities like TrainTestSplitTrainTestValSplit does not really belong in Preprocessor as it does not return a single dataframe. Should we add a new subpackage for functionalities like this one?TrainTestValSplit does not really belong in Preprocessor as it does not return a single dataframe. Should we add a new subpackage for functionalities like this one?https://gitlab.psi.ch/adelmann/mllib/-/issues/5Package is broken due to WindowBuild2019-10-23T12:29:08+02:00bellotti_rPackage is broken due to WindowBuildModule data_api is not found!Module data_api is not found!li_s1zacharias_mli_s1https://gitlab.psi.ch/adelmann/mllib/-/issues/4Discussion: Return type of Surrogate.fit()2019-10-22T10:24:13+02:00bellotti_rDiscussion: Return type of Surrogate.fit()Discussion is needed about this issue.
Suggestions so far:
- Return the ```Surrogate.predict``` functionDiscussion is needed about this issue.
Suggestions so far:
- Return the ```Surrogate.predict``` functionhttps://gitlab.psi.ch/adelmann/mllib/-/issues/3Refactoring: Make name and version properties instead of using getters2019-11-06T15:54:57+01:00bellotti_rRefactoring: Make name and version properties instead of using gettersget_name --> name
get_version --> version
Perhaps also remove the ```get``` in the other method namesget_name --> name
get_version --> version
Perhaps also remove the ```get``` in the other method namesbellotti_rbellotti_rhttps://gitlab.psi.ch/adelmann/mllib/-/issues/2Put library in a repository?2019-11-06T16:14:55+01:00bellotti_rPut library in a repository?Possible options:
- PyPI
- PSI Anaconda repositoryPossible options:
- PyPI
- PSI Anaconda repositoryhttps://gitlab.psi.ch/adelmann/mllib/-/issues/1Rename the project2020-02-04T11:21:56+01:00bellotti_rRename the projectmllib is not a good name.
Possible confusions:
- spark.mllib
- [mllib on PyPI](https://pypi.org/project/mllib/)
Suggestion by Sven Augustin: VML (Villigen ML)
Suggestion by Arnau Albà: MLScimllib is not a good name.
Possible confusions:
- spark.mllib
- [mllib on PyPI](https://pypi.org/project/mllib/)
Suggestion by Sven Augustin: VML (Villigen ML)
Suggestion by Arnau Albà: MLSci