Second Project of the Machine Learning course in collaboration with the EPFL Laboratory for Computation and Visualization in Mathematics and Mechanics.
code- Folder containing training utilitiesmetrics_helper.py- Utilities for performance measures.preprocessing_helper.py- Helper for preprocessing the data.sampling_helper.py- Implementation of the undersampling and oversampling experimented strategies.Best Run Regression.ipynb- Notebook for reproducibility of regression results.Best_Run_classification.ipynb- Notebook for reproducibility of classification results.
report- Report folderreport.pdf- Report pdf file
requirements.txt- Requirements text file
The original dataset was provided by the EPFL Laboratory for Computation and Visualization in Mathematics and Mechanics.
To clone the following repository, please run:
git clone --recursive https://github.com/johnmavro/Machine-Learning-P2.git
Requirements for the needed packages are available in requirements.txt. To install the needed packages, please run:
pip install -r requirements.txt
To reproduce the results we obtained for the regression task, please use a XGBoost Regressor model with the following hyperparameters:
parameters = {'subsample' : 0.9,
'n_estimators' : 500,
'max_depth' : 20,
'learning_rate' : 0.01,
'colsample_bytree' : 0.8,
'colsample_bylevel' : 0.6}To reproduce the results we obtained for the classification task, please use a Random Forest Classifier with the following hyperparameters:
parameters = {'n_estimators': 400,
'min_samples_split': 5,
'min_samples_leaf': 1,
'max_features': 'auto',
'max_depth': 20,
'bootstrap': False}The report can be found in pdf format in the report folder.
- Federico Betti
- Ioannis Mavrothalassitis
- Luca Rossi