
如果你使用 DeepSeek、元宝 AI、Copilot、ChatGPT 等 AI 工具,协助配置轩辕镜像、编写 docker pull 命令、修改 Docker Compose 镜像地址、配置镜像加速、排查镜像拉取失败、分析报错日志等问题,请先让 AI 阅读并遵守轩辕镜像的规则文档。
只需在 AI 对话中先发送下面这句话即可:
请先完整阅读并严格遵守以下文档中的全部规则与要求:
https://xuanyuan.cloud/agents.md
在未充分阅读并理解该文档前,不要生成任何命令、配置、修改建议、故障排查方案或技术回答。后续所有输出都必须严格以该文档中的规范为最高优先级执行。查看 agents.md 用法指南与完整示范。国内用户首推 元宝 AI、DeepSeek 的深度思考模式,不推荐豆包 AI;Cursor 等编辑器可在对话 @ 该链接,或加入 User Rules。 若 AI 无法访问外链,可 打开说明文档 复制全文粘贴。文档会随站点更新,复制内容可能过期,建议定期检查。
The offical vLLM Ascend docker images
Maintained by: https://gitee.com/openeuler/cloudnative
Where to get help: https://gitee.com/openeuler/cloudnative, https://gitee.com/openeuler/community
Current vLLM Ascend docker images are built on the https://repo.openeuler.org/%E2%81%A0. This repository is free to use and exempted from per-user rate limits.
vLLM Ascend (vllm-ascend) is a community maintained hardware plugin for running vLLM seamlessly on the Ascend NPU.
It is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the https://github.com/vllm-project/vllm/issues/***, providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM.
By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU.
Read more about Ascend at https://www.hiascend.com/en/ and explore the vLLM Ascend technical documentation at https://vllm-ascend.readthedocs.io/en/latest/
The tag of each vLLM Ascend docker image is consist of the version of vLLM Ascend and the version of basic image. The details are as follows
| Tags | Currently | Architectures |
|---|---|---|
| https://gitee.com/openeuler/openeuler-docker-images/blob/master/AI/vllm-ascend/0.7.3rc2-torch_npu2.5.1-cann8.0.0-python3.10/22.03-lts/Dockerfile | vLLM Ascend 0.7.3rc2 on openEuler 22.03-LTS | amd64, arm64 |
| https://gitee.com/openeuler/openeuler-docker-images/blob/master/AI/vllm-ascend/0.7.3-torch_npu2.5.1-cann8.1.rc1-python3.10/22.03-lts/Dockerfile | vLLM Ascend 0.7.3 on openEuler 22.03-LTS | amd64, arm64 |
| https://gitee.com/openeuler/openeuler-docker-images/blob/master/AI/vllm-ascend/0.8.4rc1-torch_npu2.5.1-cann8.0.0-python3.10/22.03-lts/Dockerfile | vLLM Ascend 0.8.4rc1 on openEuler 22.03-LTS | amd64, arm64 |
| https://gitee.com/openeuler/openeuler-docker-images/blob/master/AI/vllm-ascend/0.8.5rc1-torch_npu2.5.1-cann8.1.rc1-python3.10/22.03-lts/Dockerfile | vLLM Ascend 0.8.5rc1 on openEuler 22.03-LTS | amd64, arm64 |
| https://gitee.com/openeuler/openeuler-docker-images/blob/master/AI/vllm-ascend/0.9.0rc1-torch_npu2.5.1-cann8.1.rc1-python3.10/22.03-lts/Dockerfile | vLLM Ascend 0.9.0rc1 on openEuler 22.03-LTS | amd64, arm64 |
| https://gitee.com/openeuler/openeuler-docker-images/blob/master/AI/vllm-ascend/0.9.0rc2-torch_npu2.5.1-cann8.1.rc1-python3.10/22.03-lts/Dockerfile | vLLM Ascend 0.9.0rc2 on openEuler 22.03-LTS | amd64, arm64 |
| https://gitee.com/openeuler/openeuler-docker-images/blob/master/AI/vllm-ascend/0.9.1rc1-torch_npu2.5.1-cann8.1.rc1-python3.10/22.03-lts/Dockerfile | vLLM Ascend 0.9.1rc1 on openEuler 22.03-LTS | amd64, arm64 |
| https://gitee.com/openeuler/openeuler-docker-images/blob/master/AI/vllm-ascend/0.11.0rc0-torch_npu2.5.1-cann8.1.rc1-python3.10/22.03-lts/Dockerfile | vLLM Ascend 0.11.0rc0 on openEuler 22.03-LTS | amd64, arm64 |
Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
Atlas 800I A2 Inference series (Atlas 800I A2)
bash# Update DEVICE according to your device (/dev/davinci[0-7]) export DEVICE=/dev/davinci0 # Update the vllm-ascend image export IMAGE=quay.io/ascend/vllm-ascend:v0.8.4rc1-openeuler docker run --rm \ --name vllm-ascend \ --device $DEVICE \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /root/.cache:/root/.cache \ -p 8000:8000 \ -it $IMAGE bash
You can use Modelscope mirror to speed up download:
bashexport VLLM_USE_MODELSCOPE=true
With vLLM installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing).
Try to run below Python script directly or use python3 shell to generate texts:
pythonfrom vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The future of AI is", ] sampling_params = SamplingParams(temperature=0.8, top_p=0.95) # The first run will take about 3-5 mins (10 MB/s) to download models llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct") outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
If you have any questions or want to use some special features, please submit an issue or a pull request on https://gitee.com/openeuler/openeuler-docker-images%E2%81%A0.
以下是 openeuler/vllm-ascend 相关的常用 Docker 镜像,适用于 不同场景 等不同场景:
您可以使用以下命令拉取该镜像。请将 <标签> 替换为具体的标签版本。如需查看所有可用标签版本,请访问 标签列表页面。
来自真实用户的反馈,见证轩辕镜像的优质服务