Overview

Latest News🔥

[2025/12] Initial release of vLLM Kunlun

Overview

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU. It is the recommended approach for integrating the Kunlun backend within the vLLM community, adhering to the principles outlined in the [RFC]: Hardware pluggable. This plugin provides a hardware-pluggable interface that decouples the integration of the Kunlun XPU with vLLM.

By utilizing the vLLM Kunlun plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, and Multi-modal LLMs, can run effortlessly on the Kunlun XPU.

Prerequisites

Hardware: Kunlun3 P800
OS: Ubuntu 22.04
Software:
- Python >=3.10
- PyTorch ≥ 2.5.1
- vLLM (same version as vllm-kunlun)

Supported Models

Generative Models

Model	Support	Quantization	LoRA	Piecewise Kunlun Graph	Note
Qwen2/2.5	✅		✅	✅
Qwen3	✅		✅	✅
Qwen3-Moe/Coder	✅	✅	✅	✅
QwQ-32B	✅			✅
LLama2/3/3.1	✅			✅
GLM-4.5/Air	✅	✅	✅	✅
Qwen3-next	⚠️				coming soon
GPT OSS	⚠️				coming soon
DeepSeek-v3/3.2	⚠️				coming soon

Multimodal Language Models

Model	Support	Piecewise Kunlun Graph	Note
Qianfan-VL	✅	✅
Qwen2.5-VL	✅	✅
InternVL2.5/3/3.5	✅	✅
InternS1	✅	✅
Qwen2.5-Omni	⚠️		coming soon
Qwen3-VL	⚠️		coming soon
GLM-4.5V	✅	✅

Performance Visualization 🚀

High-performance computing at work: How different models perform on the Kunlun3 P800.

Current environment: 16-way concurrency, input/output size 2048.

Getting Started

Please use the following recommended versions to get started quickly:

Version	Release type	Doc
v0.10.1.1	Latest stable version	QuickStart and Installation for more details

Contributing

See CONTRIBUTING for more details, which is a step-by-step guide to help you set up the development environment, build, and test.

We welcome and value any contributions and collaborations:

Open an Issue if you find a bug or have a feature request

License

Apache License 2.0, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs		docs
vllm_kunlun		vllm_kunlun
.gitignore		.gitignore
.python-version		.python-version
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
README.md		README.md
build.sh		build.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Latest News🔥

Overview

Prerequisites

Supported Models

Generative Models

Multimodal Language Models

Performance Visualization 🚀

High-performance computing at work: How different models perform on the Kunlun3 P800.

Getting Started

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 4

Languages

License

baidu/vLLM-Kunlun

Folders and files

Latest commit

History

Repository files navigation

Latest News🔥

Overview

Prerequisites

Supported Models

Generative Models

Multimodal Language Models

Performance Visualization 🚀

High-performance computing at work: How different models perform on the Kunlun3 P800.

Getting Started

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages