Spacylize is a tool that distills the capabilities of large language models into compact, efficient spaCy models.
Prerequisites:
- Python 3.8+
Installation:
pip install -e .This example demonstrates how to use spacylize to generate training data and train a SpaCy model to identify key attributes from e-commerce product descriptions.
See example: examples/ecommerce/promt.yaml
spacylize generate --llm-config-path examples/ecommerce/llm.yaml --prompt-config-path examples/ecommerce/promt.yaml --n-samples 2000 --output-path examples/ecommerce/train.txt --task nerspacylize visualize --input-path examples/ecommerce/train.spacy --task ner --n-samples 5 --port 5002spacylize validate --dataset examples/ecommerce/train.spacy --output-folder examples/ecommercespacylize split --input examples/ecommerce/train.spacy --train examples/ecommerce/train_split.spacy --dev examples/ecommerce/dev_split.spacy --dev-size 0.2 --seed 42spacylize train --train-data examples/ecommerce/train_split.spacy --base-model en_core_web_sm --output-model examples/ecommerce/ecommerce_attribute_model --n-iter 100 --dropout 0.3spacylize evaluate --model examples/ecommerce/ecommerce_attribute_model --data examples/ecommerce/dev_split.spacy