A Mountain Project Rock Climbing Route Recommender
Galvanize Data Science Program - Winter 2017 - Capstone Project - David Blaszka
Mountain Project is a tremendous resource for finding information about rock climbing routes across the globe. While the site provides this excellent information on routes, it does not have any current installation of a route recommender based on past routes that a user has liked. I built RedPointer to provide users with a recommender system.
All of my data is scraped from the Mountain Project website using Requests and BeautifulSoup. For each route, I scraped the route meta data. Similarly, for each user that rated a route, I scraped their meta deta and the rating they gave the route.
All of the data was stored in MongoDB database.
The recommendation system is implemented using an ensemble method, including: Apache Spark's Alternating Least Squares (ALS) model, Sklearn's Gradient Boosting model, and a cosine similarity matrix. I tried four different types of recommendation systems:
- Factorization Recommender
- Gradient Boosting
- Item Content Recommender
- Full Ensemble of all three
The models were each evaluated using RMSE scores calculated on a hold out test group.
Mongodb, pymongo, anaconda, spark, pyspark, ggplot
- Begin by running the file, scraper_main.py, in the terminal with
python scraper_main.pyThis will save three tables (ratings, user info, and route info), including URLs and HTML, to a mongodb database.
- Next, run the file, parse_clean_store_main.py, to parse html, clean the contents, and store user/route info in a mongodb database.
python parse_clean_store_main.py- Run the function, create_ratings_matrix.py, to create a utility matrix for the als model.
python create_ratings_matrix.py-
Run the jupyter notebook file, ensemble_model_setup.ipynb, to prep the data for the ensemble model.
-
Run the file, gradient_boosting_model.py, to get a gradient boosted model, or you can use the model already saved as a pickle file
-
Run the file, als_model_test.py, to create both a validation als model and a final als model, or use saved models.
-
Run the file, ensemble_model.py, to obtain an RMSE score for the ensemble model.
- Add bouldering routes
- Extend to outside of Washington
- Focus model on only top recommendations
- Find a faster way to run my model on the website
-
MountainProject.com
-
Koren, Yehuda, Robert Bell, and Chris Volinsky. "Matrix factorization techniques for recommender systems." Computer 42.8 (2009).
-
Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive datasets. Cambridge University Press, 2014.
-
Tabony, Jade. https://github.com/Jadetabony/wta_hikes
