Documentation | slack |
- [2025/12] Initial release of vLLM Kunlun
vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU. It is the recommended approach for integrating the Kunlun backend within the vLLM community, adhering to the principles outlined in the [RFC]: Hardware pluggable. This plugin provides a hardware-pluggable interface that decouples the integration of the Kunlun XPU with vLLM.
By utilizing the vLLM Kunlun plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, and Multi-modal LLMs, can run effortlessly on the Kunlun XPU.
- Hardware: Kunlun3 P800
- OS: Ubuntu 22.04
- Software:
- Python >=3.10
- PyTorch ≥ 2.5.1
- vLLM (same version as vllm-kunlun)
| Model | Support | Quantization | LoRA | Piecewise Kunlun Graph | Note |
|---|---|---|---|---|---|
| Qwen2/2.5 | ✅ | ✅ | ✅ | ||
| Qwen3 | ✅ | ✅ | ✅ | ||
| Qwen3-Moe/Coder | ✅ | ✅ | ✅ | ✅ | |
| QwQ-32B | ✅ | ✅ | |||
| LLama2/3/3.1 | ✅ | ✅ | |||
| GLM-4.5/Air | ✅ | ✅ | ✅ | ✅ | |
| Qwen3-next | coming soon | ||||
| GPT OSS | coming soon | ||||
| DeepSeek-v3/3.2 | coming soon |
| Model | Support | Quantization | LoRA | Piecewise Kunlun Graph | Note |
|---|---|---|---|---|---|
| Qianfan-VL | ✅ | ✅ | |||
| Qwen2.5-VL | ✅ | ✅ | |||
| InternVL2.5/3/3.5 | ✅ | ✅ | |||
| InternS1 | ✅ | ✅ | |||
| Qwen2.5-Omni | coming soon | ||||
| Qwen3-VL | coming soon | ||||
| GLM-4.5V | ✅ | ✅ |
Current environment: 16-way concurrency, input/output size 2048.
Please use the following recommended versions to get started quickly:
| Version | Release type | Doc |
|---|---|---|
| v0.10.1.1 | Latest stable version | QuickStart and Installation for more details |
See CONTRIBUTING for more details, which is a step-by-step guide to help you set up the development environment, build, and test.
We welcome and value any contributions and collaborations:
- Open an Issue if you find a bug or have a feature request
Apache License 2.0, as found in the LICENSE file.

