CourseProject

The goal of this project is to develop a means of easily comparing topic modeling methods, such as LDA and Top2Vec.
This was implemented through a web app hosted here. A demo video is available here.

Install

The web app instance hosted online has limited CPU and RAM resources.
For heavy testing, it is recommended to run this app locally.

Set up your environment conda create -n myapp python=3.8 and conda activate myapp
Clone this repo and switch to project directory
Install dependencies pip install -r requirements.txt
Launch app in browser streamlit run main.py

Usage

The app has two components:

A sidebar for user input and control parameters
- choose dataset / web-scraping parameters
- set parameters such as number of topics
- search topic models with a keyword
The main pane for displaying results
- each algorithm has a dedicated column, lined up side-by-side for ease of comparison
- topics shown via wordclouds where word size corresponds to term weight
- documents returned from keyword search are displayed in height-adjustable boxes

Currently supported algorithms are LDA and Top2Vec. A simplified overview and comparison of the two is available in this tech review note.

Reflection and Future Work

Although many features were planned for this app, a decision was made to make the first version simple, not overly cluttered with dozens of parameters and customization options.

Ideas for future releases:

Data

expand available datasets for testing
speed up web scraping through parallelization
add options for lemmatization and word n-grams in vocabulary

Features

phrase/multi-term search
add more algorithms for comparison
provide users to more parameters for fine-tuning models

Utility

show and compare time taken to train topic models and perform search
offer customizable result display:
- number of documents to show
- default height of document display box
- number of wordclouds
- number of words per wordcloud

Reference

LDA is implemented via gensim while Top2Vec via top2vec. Both Python packages are available via pip.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
data		data
docs		docs
models		models
util		util
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CourseProject

Install

Usage

Reflection and Future Work

Reference

About

Uh oh!

Releases

Packages

Languages

wujameszj/CourseProject

Folders and files

Latest commit

History

Repository files navigation

CourseProject

Install

Usage

Reflection and Future Work

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages