Skip to content

acscharf/CourseProject

 
 

Repository files navigation

English and Japanese Course Reflection Analysis and Prediction

About

After learners complete a video-based online business course, they are prompted to enter a "reflection" on how they can apply the knowledge from the course to their job or daily life. These reflections are shared with other learners so they can deepen their understanding, learning how others applied their learning.

This project aims to:

1.) to analyze "useful" and "not useful" reflections, finding syntactic elements that make up each, and create a trained model (train.py)

2.) gather user input for a user reflection and predict whether that reflection is "useful" or "not useful" based upon trained model (webapp.py)

Both 1.) and 2.) are done for both English and Japanese language.

Project Presentation

https://www.youtube.com/watch?v=JN-Gm5Pj-hs

Try It

A live version of the software is hosted below, complete with sample videos. Try a "useful" and "not useful" reflection and see if it matches your expectations. The sample reflections will give you an idea of what might be considered "useful" and "not useful."

English

http://alexscharf.com/

Japanese

http://alexscharf.com/ja

train.py (Training Application)

Overview

Reads labeled CSV for reflection data, analyzes reflections, trains a model, and saves that model to disk. Analysis looks at parts of speech (by percentage), common words, and average word counts for both "useful" and "not useful" reflections.

Implementation

The application has three key functions, explained below:

read_csv(filename, rows)

Opens a csv with the name of "filename" and reads the first number rows specified by "rows." The CSV should have two columns, the first with a label of '1' if the reflection is 'useful' or '0' if it is "not useful." Strips whitespace and returns a pandas DataFrame.

analyze_reflections(reflections, nlp, language)

Analyze reflections when provided with a pandas DataFrame, spaCy NLP object, a string to display for output. Iterates over each unigram for both useful and not useful reflections, counting parts of speech, common words, and average length while ignoring whitespace. Outputs the result using the print function.

Example output: https://github.com/acscharf/CourseProject/blob/main/example_output.txt

train_reflections(reflections, nlp, n_iter, n_texts):

Trains model based upon label reflections data with a pandas DataFrame, spaCy NLP object, and the number of iterations and items in the reflection data. Holds 20% of the labeled data for evaluation, training off of the remaining 80%. Prints loss, recall, precision, and f-score for each training iteration. Currently build using the "simple_cnn" architecture provided by spaCy.

Usage

The application requires the spaCy and pandas libaries as well as the "en_core_web_sm" and "ja_core_news_sm" spaCy models.

Additionally, the software needs the english.csv and japanese.csv labeled reflection datasets in a "data" subfolder. These data sets were labeled "useful" or "not useful" by me and reflect actual user output. The data set for this project can be found in the below repository:

English

https://github.com/acscharf/CourseProject/blob/main/data/english.csv

Japanese

https://github.com/acscharf/CourseProject/blob/main/data/japanese.csv

After completion, the program saves a model to disk in the "english_model" and "japanese_model" subfolders.

Assuming the provided csv files are included, the program can be run as-is with no additional parameters.

webapp.py (User-facing Web Application)

Overview

Flask-based web application that loads training model and gathers uset input to predict usefulness of reflection. English version can be accessed at the main directory (/), while Japanese version can be accessed via a subdirectory (ja).

Implementation

The program is implemented with Flask, mixing Python and HTML. There are two pages, and submission page and a results page, both in English and Japanese.

The submission page is pure HTML and Javascript.

The results page takes the submission from the previous page a parameters, loads a spaCy model created by the train.py application, and runs the user submission against the trained text classifier to guess whether the submission is "useful" or "not useful." This output is displayed to the user, along with some generic hints for a useful reflection inferreed from analyzing reflections through the training application.

Usage

The application requires the spaCy and flask libraries as well as a training modeled generated by train.py.

The app can be launched with the following commands:

export FLASK_APP=webapp.py
flask run

This will start a development server on http://127.0.0.1:5000/.

A working version can be found at http://alexscharf.com/

Other files

Proposal.pdf

Project proposal

Progress report.pdf

Mid-term progress report

example_output.txt

Example output of train.py analysis

waitress_server.py

Configuration file for production web server

Self-evaluation

Have you completed what you have planned?

I was able to complete complete all the planned outcomes as mentioned in the original project proposal. In fact, I went beyond the project proposal by including generating a training model and a web front-end for Japanese as well as English. I initially did not include this in the original proposal because I was not sure of my ability to correctly label the Japanese data, but I found a subset of the data (for an accounting course) that allowed me to do so, and hence exceeded the original project proposal.

Have you got the expected outcome?

The outcome for the reflection predicter is as expected. As originally proposed, I conducted user tests to see if the program functions to their expectations. Their feedback was as follows:

  • The application is very good at filtering out obviously bad reflections ("The course was interesting"). This is very useful, as these low-quality responses have the largest user impact
  • The application can still be "tricked" by writing grammatically correct and keyword packed sentences that ultimately have little meaning ("I love studying business and applying business for my presentations. It helps me succeed at work with my boss and also with my coworkers." gets a perfect score). This is not intended to be a grading mechanism, however, so tackling these it outside the scope of the project.

The outcome for the reflection analyzer was also insightful, but not as much as expected. As expected, better reflections tend to have more words (around 22 on average, compared with 9 for not useful ones). However, the parts of speech and common words were quite similar for "useful" and "not useful" reflections, suggesting that to do a heuristic analysis of reflections, much deeper insight is needed and training a model is a much more effective approach, justifying the original project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%