
该容器为https://github.com/mlc-ai/mlc-llm%E9%A1%B9%E7%9B%AE%E6%8F%90%E4%BE%9B%E8%BF%90%E8%A1%8C%E7%8E%AF%E5%A2%83%EF%BC%8C%E5%9F%BA%E4%BA%8EApache TVM Unity构建,集成了CUDA、cuDNN、CUTLASS、FasterTransformer和FlashAttention-2等加速内核,适用于在支持L4T系统上进行大语言模型的量化处理和性能评估。
步骤1:准备原始模型
首先下载需要量化的Hugging Face Transformers格式模型,并创建符号链接至/data/models/mlc/dist/models目录,使MLC能正确识别:
bash./run.sh --env HUGGINGFACE_TOKEN=<YOUR-ACCESS-TOKEN> $(./autotag mlc) /bin/bash -c '\ ln -s $(huggingface-downloader meta-llama/Llama-2-7b-chat-hf) /data/models/mlc/dist/models/Llama-2-7b-chat-hf'
[!NOTE]
若量化Llava模型,需将原始模型的config.json中"model_type": "llava"修改为"model_type": "llama"(可在模型下载至/data/models/huggingface后本地修改)
步骤2:执行量化
对模型执行W4A16量化:
bash./run.sh $(./autotag mlc) \ python3 -m mlc_llm.build \ --model Llama-2-7b-chat-hf \ --quantization q4f16_ft \ --artifact-path /data/models/mlc/dist \ --max-seq-len 4096 \ --target cuda \ --use-cuda-graph \ --use-flash-attn-mqa
量化后的模型及其运行时文件将保存至/data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft目录。
使用benchmark.py脚本评估量化模型性能:
bash./run.sh $(./autotag mlc) \ python3 /opt/mlc-llm/benchmark.py \ --model /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/params \ --prompt /data/prompts/completion_16.json \ --max-new-tokens 128
参数说明
--prompt:指定输入提示文件,控制上下文长度(/data/prompts目录下提供最长4096 tokens的提示序列)--max-new-tokens:指定模型为每个提示生成的输出tokens数量输出示例
AVERAGE OVER 10 RUNS: /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/params: prefill_time 0.027 sec, prefill_rate 582.8 tokens/sec, decode_time 2.986 sec, decode_rate 42.9 tokens/sec
结果解释
mlc:0.1.0 | |
|---|---|
| 系统要求 | L4T ['>=36'] |
| 依赖项 | https://github.com/dusty-nv/jetson-containers/tree/master/packages/build/build-essential https://github.com/dusty-nv/jetson-containers/tree/master/packages/cuda/cuda https://github.com/dusty-nv/jetson-containers/tree/master/packages/cuda/cudnn https://github.com/dusty-nv/jetson-containers/tree/master/packages/build/python https://github.com/dusty-nv/jetson-containers/tree/master/packages/numpy https://github.com/dusty-nv/jetson-containers/tree/master/packages/build/cmake/cmake_pip https://github.com/dusty-nv/jetson-containers/tree/master/packages/onnx https://github.com/dusty-nv/jetson-containers/tree/master/packages/pytorch https://github.com/dusty-nv/jetson-containers/tree/master/packages/pytorch/torchvision https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/huggingface_hub https://github.com/dusty-nv/jetson-containers/tree/master/packages/build/rust https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/transformers |
| Dockerfile | https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/mlc/Dockerfile |
| 说明 | https://github.com/mlc-ai/mlc-llm/tree/607dc5a 提交SHA https://github.com/mlc-ai/mlc-llm/tree/607dc5a |
mlc:0.1.0-builder | |
|---|---|
| 系统要求 | L4T ['>=36'] |
| 依赖项 | 同上 |
| Dockerfile | https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/mlc/Dockerfile |
| 说明 | https://github.com/mlc-ai/mlc-llm/tree/607dc5a 提交SHA https://github.com/mlc-ai/mlc-llm/tree/607dc5a |
mlc:0.1.1 | |
|---|---|
| 系统要求 | L4T ['>=36'] |
| 依赖项 | 同上 |
| Dockerfile | https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/mlc/Dockerfile |
| 镜像 | https://hub.docker.com/r/dustynv/mlc/tags (2024-04-18, 7.4GB) |
| 说明 | https://github.com/mlc-ai/mlc-llm/tree/3403a4e 提交SHA https://github.com/mlc-ai/mlc-llm/tree/3403a4e |
mlc:0.1.1-builder | |
|---|---|
| 系统要求 | L4T ['>=36'] |
| 依赖项 | 同上 |
| Dockerfile | https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/mlc/Dockerfile |
| 说明 | https://github.com/mlc-ai/mlc-llm/tree/3403a4e 提交SHA https://github.com/mlc-ai/mlc-llm/tree/3403a4e |
| 仓库/标签 | 日期 | 架构 | 大小 |
|---|---|---|---|
| https://hub.docker.com/r/dustynv/mlc/tags | 2024-04-18 | arm64 | 7.4GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2024-02-16 | arm64 | 10.8GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2024-02-16 | arm64 | 9.6GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2024-02-16 | arm64 | 9.5GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2024-02-16 | arm64 | 10.6GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2024-02-22 | arm64 | 9.6GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2024-02-27 | arm64 | 9.6GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2024-02-20 | arm64 | 9.6GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2023-10-30 | arm64 | 9.0GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2023-12-16 | arm64 | 9.4GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2023-12-16 | arm64 | 10.6GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2023-12-16 | arm64 | 9.4GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2023-11-05 | arm64 | 8.9GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2024-01-27 | arm64 | 9.4GB |
| https://hub.docker.com/r/dustynv/mlc/tags | 2024-03-09 | arm64 | 9.6GB |
容器镜像兼容其他次要版本的JetPack/L4T:
• L4T R32.7容器可在其他L4T R32.7版本(JetPack 4.6+)上运行
• L4T R35.x容器可在其他L4T R35.x版本(JetPack 5.1+)上运行
可使用https://github.com/dusty-nv/jetson-containers/tree/master/docs/run.md%E5%92%8Chttps://github.com/dusty-nv/jetson-containers/tree/master/docs/run.md#autotag%E5%90%AF%E5%8A%A8%E5%AE%B9%E5%99%A8%EF%BC%8C%E6%88%96%E6%89%8B%E5%8A%A8%E6%9E%84%E9%80%A0%60docker run`命令:
bash# 自动拉取或构建兼容的容器镜像 jetson-containers run $(autotag mlc) # 或显式指定上述镜像之一 jetson-containers run dustynv/mlc:0.1.1-r36.2.0 # 或使用'docker run'(需指定镜像及挂载等参数) sudo docker run --runtime nvidia -it --rm --network=host dustynv/mlc:0.1.1-r36.2.0
https://github.com/dusty-nv/jetson-containers/tree/master/docs/run.md%E5%B0%86%E5%8F%82%E6%95%B0%E8%BD%AC%E5%8F%91%E7%BB%99%60docker run
,并添加默认配置(如--runtime nvidia、挂载/data`缓存、检测设备)
https://github.com/dusty-nv/jetson-containers/tree/master/docs/run.md#autotag%E4%BC%9A%E6%9F%A5%E6%89%BE%E4%B8%8EJetPack/L4T%E7%89%88%E6%9C%AC%E5%85%BC%E5%AE%B9%E7%9A%84%E5%AE%B9%E5%99%A8%E9%95%9C%E5%83%8F%EF%BC%88%E6%9C%AC%E5%9C%B0%E3%80%81%E4%BB%93%E5%BA%93%E6%8B%89%E5%8F%96%E6%88%96%E6%9E%84%E5%BB%BA%EF%BC%89
使用-v或--volume标志将主机目录挂载到容器:
bashjetson-containers run -v /host/path:/container/path $(autotag mlc)
启动容器时直接运行命令(非交互式shell):
bashjetson-containers run $(autotag mlc) my_app --abc xyz
可传递任何docker run支持的选项,执行前会打印完整命令。
如使用上述autotag命令,需构建时会自动提示。手动构建需先完成https://github.com/dusty-nv/jetson-containers/tree/master/docs/setup.md%EF%BC%8C%E7%84%B6%E5%90%8E%E8%BF%90%E8%A1%8C%EF%BC%9A
bashjetson-containers build mlc
构建过程会集成上述依赖项并进行测试。使用https://github.com/dusty-nv/jetson-containers/tree/master/jetson_containers/build.py%E6%9F%A5%E7%9C%8B%E6%9E%84%E5%BB%BA%E9%80%89%E9%A1%B9%E3%80%82
您可以使用以下命令拉取该镜像。请将 <标签> 替换为具体的标签版本。如需查看所有可用标签版本,请访问 标签列表页面。
探索更多轩辕镜像的使用方法,找到最适合您系统的配置方式
通过 Docker 登录认证访问私有仓库
无需登录使用专属域名
Kubernetes 集群配置 Containerd
K3s 轻量级 Kubernetes 镜像加速
VS Code Dev Containers 配置
Podman 容器引擎配置
HPC 科学计算容器配置
ghcr、Quay、nvcr 等镜像仓库
Harbor Proxy Repository 对接专属域名
Portainer Registries 加速拉取
Nexus3 Docker Proxy 内网缓存
需要其他帮助?请查看我们的 常见问题Docker 镜像访问常见问题解答 或 提交工单
docker search 限制
站内搜不到镜像
离线 save/load
插件要用 plugin install
WSL 拉取慢
安全与 digest
新手拉取配置
镜像合规机制
manifest unknown
no matching manifest(架构)
invalid tar header(解压)
TLS 证书失败
DNS 超时
域名连通性排查
410 Gone 排查
402 与流量用尽
401 认证失败
429 限流
D-Bus 凭证提示
413 与超大单层
来自真实用户的反馈,见证轩辕镜像的优质服务