This file will become your README and also the index of your documentation.
If you are new to using nbdev here are some useful pointers to get you
started.
# make sure tayz_decoding package is installed in development mode
$ pip install -e .
# make changes under nbs/ directory
# ...
# compile to have changes apply to tayz_decoding
$ nbdev_prepareInstall latest from the GitHub repository:
$ pip install git+https://github.com/khankanz/tayz_decoding.gitThis environment is configured for CUDA 12.1 + PyTorch with CUDA support, plus xgrammar, transformers and a CUDA-accelerated build of llama-cpp-python
conda create -n crane python=3.10 -y
conda activate craneThis pulls the official NVIDIA libraries that match the driver on your machine.
conda install -c nvidia cuda-toolkit=12.1 -yImportant note about CUDA compatibility
- NVIDIA drivers are forward-compatible: a driver that supports CUDA 12.1 (or newer) can run applications built against CUDA 12.1, 12.2, 12.3 etc.
- Run
nvidia-smi- the 'CUDA Version' column in the top-right shows the maximum CUDA runtime your driver supports. As long as that number is >= 12.1, this env will get full GPU acceleration.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121xgrammar currently pulls in dependencies that can conflict with versions we need later. Install it with --no-deps first. We manually install the exact versions we want right after:
pip install xgrammar --no-depsAlways use --dry-run first! This lets you see exactly which versions/wheels will be installed or upgraded before anything happens. It prevents accidental CUDA mismatches or huge re-downloads.
pip install pydantic transformers ninja --dry-run- if the dry-runs look good, run them for real by removing flag
This step compiles llama-cpp-python with GPU support (GMML -> CUDA)
# First: dry-run to verify it will compile and not try to pull wrong CUDA wheels
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --dry-run
# If everything looks correct → install for real
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --verbose- The
--verboseflag is helpful the first time so you can see the cmake/ninja output and confirm it's actually detecting and using your CUDA toolkit.
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
python -c "import llama_cpp; print('llama-cpp-python built with CUDA:', llama_cpp.__cuda__)"
pip list | grep -E "(torch|xgrammar|transformers|llama-cpp-python)"Congratulations, you should now have a fully working crane env with GPU-accelerated PyTorch, HuggingFace transformers, xgrammar and llama-cpp-python. Don't forget to pip install this lib now;
pip install git+https://github.com/khankanz/tayz_decoding.gitTBD