httrees is a Python module for hierarchical topic modeling. It implements an algorithm that constructs a topic hierarchy tree through successive application of flat topic models. It also contains several text vectorizer implementations, including support for fine-tuning deep word embeddings.
This project was started in 2021 as part of CS410 at the University of Illinois Urbana-Champaign.
httrees requires:
- NumPy
- SciPy
- Pandas
- Gensim
It does not strictly require scikit-learn, but is intended to be used alongside sklearn flat clustering models, though any clustering model following the sklearn API will be compatible.
httrees can be installed from git:
pip install git+git://github.com/bllguo/CourseProject
An example use case, along with a written overview of the implementation, can be found in IPython notebook form here. They can also be found at this page.
An example for fine-tuning embeddings can be found in this notebook and this page.
A video demo is available at this link.