支持OrangePi LLM推理,当前测试硬件版本: Orange Pi 20T 24GB
支持Qwen2ForCausalLM模型
建议通过git直接从modelscope下载(需要安装git lfs),比如DeepSeek-R1-Distill-Qwen-1.5B:
git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.git
python3 /data/llm_simple/scripts/convert_qwen2_weight.py --input_model_path /ssd/models/Qwen2.5-3B-Instruct --output_dir /ssd/models/Qwen2.5-3B-Instruct_converted
python3 /data/llm_simple/scripts/convert_qwen2_awq_weight.py --input_model_path /ssd/models/Qwen2.5-14B-Instruct-AWQ --output_dir /ssd/models/Qwen2.5-14B-Instruct-AWQ_converted
请将脚本中的路径改为自己的路径
bash scripts/example_text_completion_deepseek_r1_qwen2.5_1.5B_bf16_orangepi.sh
| 模型大小 | ttft(ms) | decode(ms/token) |
|---|---|---|
| 1.5B | 461 | 142 |
| 3B | 776 | 284 |
| 7B | 3215 | 881 |
| 3B-AWQ | 3215 | 113 |
| 7B-AWQ | 2358 | 206 |
| 14B-AWQ | 8181 | 653 |
python3 scripts/convert_llama2_weight.py --input_dir <llama_path> --model_size 7B --output_dir <output dir>
权重下载链接:model.safetensors
python3 scripts/convert_llama_awq_4bit.py --input_safetensor <model.safetensors path> --output_dir <weight output path>
权重下载链接:model.safetensors
python3 scripts/convert_llama_awq_4bit.py --input_safetensor <model.safetensors path> --output_dir <weight output path>
请将转化后的权重文件夹,配置文件, tokenizer文件拷贝到设备上并修改bash文件中对应的路径
bash scripts/example_chat_llama2_7B_fp16_orangepi.shbash scripts/example_text_completion_llama2_7B_fp16_orangepi.shbash scripts/example_chat_llama2_7B_awq_4bit_orangepi.shbash scripts/example_text_completion_llama2_7B_awq_4bit_orangepi.shbash scripts/example_chat_llama2_13B_awq_4bit_orangepi.shbash scripts/example_text_completion_llama2_13B_awq_4bit_orangepi.sh
| 场景 | ttft(ms) | decode(ms/token) |
|---|---|---|
| llama2-7B-AWQ-4bit | 886 | 176.7 |
| llama2-7B-FP16 | 4498 | 568.4 |
| llama2-13B-AWQ-4bit | 1819 | 320.1 |