Tags: AmirStudy/BERTopic
Tags
v0.7 (MaartenGr#87) Highlights: * (semi-)supervised topic modeling * Added Spacy, Gensim, USE (TFHub) * Use a different backend for document embeddings and word embeddings * Create your own backends with `bertopic.backend.BaseEmbedder` * Calculate and visualize topics per class Fixes: * Fixed issues with Torch req * Prevent saving term frequency matrix in CTFIDF class * Fixed DTM not working when reducing topics (MaartenGr#96) * Moved visualization dependencies to base BERTopic * `pip install bertopic[visualization]` becomes `pip install bertopic` * Allow precomputed embeddings in bertopic.find_topics() (MaartenGr#79)
v0.5 (MaartenGr#46) * Add Flair to allow for more (custom) token/document embeddings * Option to use custom UMAP, HDBSCAN, and CountVectorizer * Added low_memory parameter to reduce memory during computation * Improved verbosity (shows progress bar) * Improved testing * Use the newest version of sentence-transformers as it speeds ups encoding significantly * Return the figure of visualize_topics() * Expose all parameters with a single function: get_params() * Option to disable the saving of embedding_model, should reduce BERTopic size significantly * Add FAQ page
Bugfix topic reduction (MaartenGr#15) * Fix topic reduction not accounting for linked mappings * Return nr_topics + 1 (= outlier group)
Added probability distributions (MaartenGr#9) * Add topic probabilities (MaartenGr#8) * Added visualization of topic probabilities * Update documentation * Remove logging for topic reduction
Fixed ngram + added stopwords (MaartenGr#6, MaartenGr#5) * Fixed ngram and added stopwords * Update pypi version
PreviousNext