We created an interface with Flask for a client to search the transcripts of lectures on Coursera. The server ranks a collection of lectures based on the query and serves the updated webpage with the search results. There is also a content-based recommender system.
The dependency metapy only works on certain versions of Python 3, such as 3.5.10. It is recommended to set up a virtual environment with a version that works with metapy.
- Clone the repository
git clone https://github.com/IEnjoyEatingCookies/CourseProject.git
- Install dependencies
pip install metapy pytoml flask coursera-dlIf the project already has the dataset included, then this step can be skipped.
- Enter your credentials in
GetTranscript.py. To get your CAUTH, you must use chrome. Go toChrome Settings > Cookiesand in the dropdown, click https://www.coursera.org/. Then find and clickCopy value CAUTH. - Download the raw dataset.
python GetTranscripts.py- Build the dataset. Keep
BuildDataset.pyoutside of the folder that contains the scraped data. Running the script createsallData.txt, which contains all of the video transcript and text files from Coursera.
python BuildDataset.py- Format the dataset.
python GetLessonTitle.pyRun the flask server.
flask run