The following instructions have been tested with Python2.7 on Linux and MacOS
-
You should have ElasticSearch installed and running -- https://www.elastic.co/guide/en/elasticsearch/reference/current/targz.html
-
Create the index in ElasticSearch by running
python create_es_index.pyfromEducationalWeb/ -
Download tfidf_outputs.zip from here -- https://drive.google.com/file/d/19ia7CqaHnW3KKxASbnfs2clqRIgdTFiw/view?usp=sharing
Unzip the file and place the folder under
EducationalWeb/static -
Download cs410.zip from here -- https://drive.google.com/file/d/1Xiw9oSavOOeJsy_SIiIxPf4aqsuyuuh6/view?usp=sharing
Unzip the file and place the folder under
EducationalWeb/pdf.js/static/slides/ -
Run
python scraper.pyfromCourseProject/crawling/to scrape lecture slides from the website -
Then run
python parsePDFunderEducationalWeb/pdf.js/to normalize the slides name and save one PDF into a folder with single slides. -
Run
python getRelatedFiles.pyinEducationalWeb/pdf.js/staticto get every single slide’s related slides with ranking scores -
From
EducationalWeb/pdf.js/build/generic/web, run the following command:gulp server -
In another terminal window, run
python app.pyfromEducationalWeb/ -
The site should be available at http://localhost:8096/