This repository holds the implementation, the datasets and results for the thesis project: Improving Collaborative Filtering Techniques by the use of Co-Training in Recommender Systems.
This project was done under a thesis research for Fernando Benjamín Pérez Maurera, under supervision of Professor Paolo Cremonesi and Engineer Maurizio Ferrari, at Politecnico di Milano.
The project is organized as follows:
- Datasets: A folder where the datasets are located.
- ml10m: folder containing the Movielens10M dataset.
- ratings.csv: ratings file.
- ml10m: folder containing the Movielens10M dataset.
- Implementation: RecPy module where the recommenders and helper classes are.
- read-results: folder where
bashscripts are, and are used to read the results output by Co-Training. There are for each recommender combination. The cases inside areItemKNN/FunkSVD,ItemKNN/SLIM,ItemKNN/SLIMBPRandItemKNN/BPRMF. - run-examples: folder where
bashscripts to run Co-Training are. There are several recommenders combinations as:ItemKNN/FunkSVD,ItemKNN/SLIM,ItemKNN/SLIMBPRandItemKNN/BPRMF. - Results: A folder generated when running
run-knn.shandresults-knn/shscripts. In this folder the results for each test-case will be put. - scripts: Folder where its located the two main
Pythonfiles,holdout.py, which makes a holdout@k of the dataset, runs Co-Training and evaluates the recommenders, andread-results.py, which reads the results of each output file and generates new plots. README.md: This file.requirements.txt: File for Conda or PIP that has the libraries and modules required to run the code.results-knn.sh: mainbashscript to read the results that each Co-Training process outputs for each test case insideread-results.run-knn.sh: mainbashscript to run the Co-Training process for each test case insiderun-examples.
Python 3.6+.C++Compiler.- On Linux, ensure that you have packages
libc6-devandbuild-essentials
-
[On Linux] Install Linux packages:
apt-get install -y libc6-dev build-essentials. -
Install
MinicondaforPython 3.6+here. -
Create the virtual environment:
conda create -n cotraining --file requirements.txt -
Activate the virtual environment:
source activate cotraining. -
[Installation and run separately] Install the project:
cd Configuration/ ; sh install.sh ; cd .. -
[Installation and run separately] Run one of the examples: *
cd run-examples/ ; sh knn-funksvd.sh -p <p-most positive> -n <n-most negative> -u <size of U'>; cd ..*cd run-examples/ ; sh knn-slim.sh -p <p-most positive> -n <n-most negative> -u <size of U'>; cd ..*cd run-examples/ ; sh knn-bprmf.sh -p <p-most positive> -n <n-most negative> -u <size of U'>; cd ..*cd run-examples/MyMediaLite/bin/ ; sh knn-slimbpr.sh -p <p-most positive> -n <n-most negative> -u <size of U'>; cd .. -
[Installation and run integrated] Run the
run-knn.shscript:sh run-knn.sh -p <p-most positive> -n <n-most negative> -u <size of U'> -
[Only to generate new plots] Run the
results-knn.shscript:sh results-knn.sh -p <p-most positive> -n <n-most negative> -u <size of U'>
The test cases included with the project are ItemKNN/FunkSVD, ItemKNN/SLIM, ItemKNN/SLIMBPR and ItemKNN/BPRMF. The dataset used is Movielens10M, a holdout technique at 20% was used. A top-10 recommendation list was generated at evaluation time for each user. The items were divided into 10 bins based on their popularity, where the least popular is bin_0 and the most popular is bin_9. When running the test cases, the Results folder will be created, and a subfolder for each test case will be created.
At the moment, the project generates output files for: evaluation of RMSE, MAP, ROC-AUC, Precision, Recall, NDCG and MRR, a file containing the number of p-most and n-most items rated at each iteration, the agreement between the recommenders, and the popularity of the items recommended.