The offical vLLM Ascend docker images
Maintained by: openEuler CloudNative SIG
Where to get help: openEuler CloudNative SIG, openEuler
Current vLLM Ascend docker images are built on the openEuler. This repository is free to use and exempted from per-user rate limits.
vLLM Ascend (vllm-ascend) is a community maintained hardware plugin for running vLLM seamlessly on the Ascend NPU.
It is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the https://github.com/vllm-project/vllm/issues/***, providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM.
By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU.
Read more about Ascend at hiascend.com and explore the vLLM Ascend technical documentation at vllm-ascend.readthedocs.io
The tag of each vLLM Ascend docker image is consist of the version of vLLM Ascend and the version of basic image. The details are as follows
| Tags | Currently | Architectures |
|---|---|---|
| 0.7.3rc2-torch_npu2.5.1-cann8.0.0-python3.10-oe2203lts | vLLM Ascend 0.7.3rc2 on openEuler 22.03-LTS | amd64, arm64 |
| 0.7.3-torch_npu2.5.1-cann8.1.rc1-python3.10-oe2203lts | vLLM Ascend 0.7.3 on openEuler 22.03-LTS | amd64, arm64 |
| 0.8.4rc1-torch_npu2.5.1-cann8.0.0-python3.10-oe2203lts | vLLM Ascend 0.8.4rc1 on openEuler 22.03-LTS | amd64, arm64 |
| 0.8.5rc1-torch_npu2.5.1-cann8.1.rc1-python3.10-oe2203lts | vLLM Ascend 0.8.5rc1 on openEuler 22.03-LTS | amd64, arm64 |
| 0.9.0rc1-torch_npu2.5.1-cann8.1.rc1-python3.10-oe2203lts | vLLM Ascend 0.9.0rc1 on openEuler 22.03-LTS | amd64, arm64 |
| 0.9.0rc2-torch_npu2.5.1-cann8.1.rc1-python3.10-oe2203lts | vLLM Ascend 0.9.0rc2 on openEuler 22.03-LTS | amd64, arm64 |
| 0.9.1rc1-torch_npu2.5.1-cann8.1.rc1-python3.10-oe2203lts | vLLM Ascend 0.9.1rc1 on openEuler 22.03-LTS | amd64, arm64 |
| 0.11.0rc0-torch_npu2.5.1-cann8.1.rc1-python3.10-oe2203lts | vLLM Ascend 0.11.0rc0 on openEuler 22.03-LTS | amd64, arm64 |
Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
Atlas 800I A2 Inference series (Atlas 800I A2)
bash# Update DEVICE according to your device (/dev/davinci[0-7]) export DEVICE=/dev/davinci0 # Update the vllm-ascend image export IMAGE=quay.io/ascend/vllm-ascend:v0.8.4rc1-openeuler docker run --rm \ --name vllm-ascend \ --device $DEVICE \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /root/.cache:/root/.cache \ -p 8000:8000 \ -it $IMAGE bash
You can use Modelscope mirror to speed up download:
bashexport VLLM_USE_MODELSCOPE=true
With vLLM installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing).
Try to run below Python script directly or use python3 shell to generate texts:
pythonfrom vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The future of AI is", ] sampling_params = SamplingParams(temperature=0.8, top_p=0.95) # The first run will take about 3-5 mins (10 MB/s) to download models llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct") outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
If you have any questions or want to use some special features, please submit an issue or a pull request on openeuler-docker-images.
以下是 openeuler/vllm-ascend 相关的常用 Docker 镜像,适用于 不同场景 等不同场景:
您可以使用以下命令拉取该镜像。请将 <标签> 替换为具体的标签版本。如需查看所有可用标签版本,请访问 标签列表页面。






探索更多轩辕镜像的使用方法,找到最适合您系统的配置方式
通过 Docker 登录认证访问私有仓库
无需登录使用专属域名
Kubernetes 集群配置 Containerd
K3s 轻量级 Kubernetes 镜像加速
VS Code Dev Containers 配置
Podman 容器引擎配置
HPC 科学计算容器配置
ghcr、Quay、nvcr 等镜像仓库
Harbor Proxy Repository 对接专属域名
Portainer Registries 加速拉取
Nexus3 Docker Proxy 内网缓存
需要其他帮助?请查看我们的 常见问题Docker 镜像访问常见问题解答 或 提交工单
docker search 限制
站内搜不到镜像
离线 save/load
插件要用 plugin install
WSL 拉取慢
安全与 digest
新手拉取配置
镜像合规机制
manifest unknown
no matching manifest(架构)
invalid tar header(解压)
TLS 证书失败
DNS 超时
域名连通性排查
410 Gone 排查
402 与流量用尽
401 认证失败
429 限流
D-Bus 凭证提示
413 与超大单层
来自真实用户的反馈,见证轩辕镜像的优质服务