This repository contains Python code that implements a rudimentary
No additional dependencies are required to use ngram.py. However, to run the example, you will need to install tqdm.
See the example.
In brief, the main code is in ngram.py. The main class of concern is NGramModel, which constructor takes in one parameter n representing the order of the language model.
- The size of the
NGramModelinstancemodel(i.e., number of$n$ -grams stored) can be obtained usingmodel.size. - You can save the model by using
model.save(), and load an existing model by usingNGramModel.load(). See the example for more details.- The
serialize()anddeserialize()methods are provided for convenience, and are not required to be used.
- The
-
model.add_ngram()adds a new$n$ -gram to the model. This takes a tuple of length$n$ , representing the$n$ -gram to be inserted. -
model.prune()prunes the model by removing nodes with relative frequencies less than a threshold. See the example for more details. - To generate text, use
model.generate_text(). Provide a starting$k$ -gram (where$k \leq n$ ) and the number of words to generate. You can provide a seed to control generation output. The example has more details.
This repository is released into the public domain under The Unlicense.