- Mallet Extension
In Mallet package, it only contains two topic Models--LDA and Hierachical LDA.
So I tried to implement some useful topic modeling method on it.
Model:
- Hierarchical Dirichlet Process with Gibbs Sampling. (in
HDPfolder) - Inference part for hLDA. (in
hLDAfolder)
Usage:
- This is an extension for Mallet, so you need to have Mallet's source code first.
- put
HDP.java,HDPInferencer.javaandHierarchicalLDAInferencer.javainsrc/cc/mallet/topicsfolder. - If you are going to run HDP, make sure you include
knowceanspackage in your project. - run
HDPTest.javaorhLDATest.javawill give you a demo for a small dataset indatafolder.
References:
- Mallet: http://mallet.cs.umass.edu/
- knowceans: http://sourceforge.net/projects/knowceans/
- HDP paper: http://www.cs.berkeley.edu/~jordan/papers/hdp.pdf
- HDP paper & source code: "Implementing the HDP with minimum code complexity" by Gregor Heinrich
- Scikit-learn Extension
Scikit-learn doesn't have any topic models yet, so I modified Matthew D. Hoffman's onlineldavb into scikit-learn format.
Model:
- online LDA with variational EM. (In
LDAfolder)
Usage:
- Install Python package.
- Run
python lda_example.pywill give you an example with 20 News Group dataset.
Reference:
- Scikit-learn: http://scikit-learn.org
- onlineLDA: http://www.cs.princeton.edu/~mdhoffma/code/onlineldavb.tar
- online LDA paper: http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf