
如果你使用 DeepSeek、元宝 AI、Copilot、ChatGPT 等 AI 工具,协助配置轩辕镜像、编写 docker pull 命令、修改 Docker Compose 镜像地址、配置镜像加速、排查镜像拉取失败、分析报错日志等问题,请先让 AI 阅读并遵守轩辕镜像的规则文档。
只需在 AI 对话中先发送下面这句话即可:
请先完整阅读并严格遵守以下文档中的全部规则与要求:
https://xuanyuan.cloud/agents.md
在未充分阅读并理解该文档前,不要生成任何命令、配置、修改建议、故障排查方案或技术回答。后续所有输出都必须严格以该文档中的规范为最高优先级执行。查看 agents.md 用法指南与完整示范。国内用户首推 元宝 AI、DeepSeek 的深度思考模式,不推荐豆包 AI;Cursor 等编辑器可在对话 @ 该链接,或加入 User Rules。 若 AI 无法访问外链,可 打开说明文档 复制全文粘贴。文档会随站点更新,复制内容可能过期,建议定期检查。
Hadoop Sandbox Docker镜像是https://github.com/hadoop-sandbox%E9%A1%B9%E7%9B%AE%E7%9A%84%E5%AE%B9%E5%99%A8%E5%8C%96%E5%AE%9E%E7%8E%B0%EF%BC%8C%E6%8F%90%E4%BE%9BHadoop%E9%9B%86%E7%BE%A4%E6%A0%B8%E5%BF%83%E7%BB%84%E4%BB%B6%E7%9A%84%E5%AE%B9%E5%99%A8%E5%8C%96%E9%83%A8%E7%BD%B2%E6%96%B9%E6%A1%88%E3%80%82%E8%AF%A5%E9%95%9C%E5%83%8F%E9%9B%86%E5%90%88%E5%B0%86Hadoop%E7%94%9F%E6%80%81%E7%B3%BB%E7%BB%9F%E7%9A%84%E5%85%B3%E9%94%AE%E7%BB%84%E4%BB%B6%EF%BC%88HDFS%E3%80%81YARN%E3%80%81MapReduce%E7%AD%89%EF%BC%89%E6%8B%86%E5%88%86%E4%B8%BA%E7%8B%AC%E7%AB%8B%E9%95%9C%E5%83%8F%EF%BC%8C%E6%97%A8%E5%9C%A8%E7%AE%80%E5%8C%96Hadoop%E7%8E%AF%E5%A2%83%E7%9A%84%E6%90%AD%E5%BB%BA%E3%80%81%E9%85%8D%E7%BD%AE%E5%92%8C%E7%AE%A1%E7%90%86%EF%BC%8C%E9%80%82%E7%94%A8%E4%BA%8EHadoop%E5%BC%80%E5%8F%91%E3%80%81%E6%B5%8B%E8%AF%95%E3%80%81%E5%AD%A6%E4%B9%A0%E5%8F%8A%E5%B0%8F%E5%9E%8B%E6%BC%94%E7%A4%BA%E5%9C%BA%E6%99%AF%E3%80%82
Hadoop Sandbox镜像包含以下核心组件,各组件功能如下:
hdfs、yarn、mapred等)。hadoop),确保运行环境隔离与安全。假设镜像托管于Docker Hub(实际需替换为项目官方仓库地址),拉取命令如下:
bash# 拉取基础镜像 docker pull hadoop-sandbox/hadoop-base:latest # 拉取客户端镜像 docker pull hadoop-sandbox/hadoop-client:latest # 拉取HDFS组件镜像 docker pull hadoop-sandbox/hadoop-hdfs-namenode:latest docker pull hadoop-sandbox/hadoop-hdfs-datanode:latest # 拉取YARN组件镜像 docker pull hadoop-sandbox/hadoop-yarn-resourcemanager:latest docker pull hadoop-sandbox/hadoop-yarn-nodemanager:latest # 拉取MapReduce作业历史服务器镜像 docker pull hadoop-sandbox/hadoop-mapred-jobhistoryserver:latest
以下为独立运行hadoop-client(客户端节点)的示例,用于通过SSH连接集群:
bashdocker run -d \ --name hadoop-client \ -p 2222:22 \ # 映射SSH端口到主机2222 --network hadoop-net \ # 建议使用自定义网络隔离集群 hadoop-sandbox/hadoop-client:latest
运行后,通过SSH连接客户端节点:
bashssh -p 2222 hadoop@localhost # 默认用户:hadoop,密码:hadoop(需参考镜像实际配置)
Hadoop集群需多组件协同工作,推荐使用docker-compose编排部署。以下为最小化集群配置示例(docker-compose.yml):
yamlversion: '3.8' networks: hadoop-net: driver: bridge volumes: hdfs-namenode-data: # 持久化Namenode元数据 hdfs-datanode-data: # 持久化Datanode数据块 services: # HDFS Namenode namenode: image: hadoop-sandbox/hadoop-hdfs-namenode:latest container_name: hadoop-namenode networks: - hadoop-net volumes: - hdfs-namenode-data:/hadoop/dfs/name environment: - HDFS_NAMENODE_HOST=namenode # 容器内主机名(与service名一致) - HDFS_REPLICATION_FACTOR=1 # 测试环境副本数设为1 ports: - "9870:9870" # HDFS WebUI端口 restart: unless-stopped # HDFS Datanode datanode: image: hadoop-sandbox/hadoop-hdfs-datanode:latest container_name: hadoop-datanode networks: - hadoop-net volumes: - hdfs-datanode-data:/hadoop/dfs/data environment: - HDFS_NAMENODE_URI=hdfs://namenode:9000 # 连接Namenode地址 depends_on: - namenode restart: unless-stopped # YARN ResourceManager resourcemanager: image: hadoop-sandbox/hadoop-yarn-resourcemanager:latest container_name: hadoop-resourcemanager networks: - hadoop-net ports: - "8088:8088" # YARN WebUI端口 environment: - YARN_RESOURCEMANAGER_HOST=resourcemanager depends_on: - namenode restart: unless-stopped # YARN NodeManager nodemanager: image: hadoop-sandbox/hadoop-yarn-nodemanager:latest container_name: hadoop-nodemanager networks: - hadoop-net environment: - YARN_RESOURCEMANAGER_HOST=resourcemanager - YARN_NODEMANAGER_HOST=nodemanager depends_on: - resourcemanager restart: unless-stopped # MapReduce JobHistoryServer jobhistoryserver: image: hadoop-sandbox/hadoop-mapred-jobhistoryserver:latest container_name: hadoop-jobhistoryserver networks: - hadoop-net ports: - "19888:19888" # JobHistory WebUI端口 environment: - MAPRED_HISTORY_SERVER_HOST=jobhistoryserver depends_on: - resourcemanager restart: unless-stopped # Hadoop Client(SSH访问) client: image: hadoop-sandbox/hadoop-client:latest container_name: hadoop-client networks: - hadoop-net ports: - "2222:22" # SSH端口映射 depends_on: - namenode - resourcemanager restart: unless-stopped
部署步骤:
docker-compose.yml文件,复制上述配置。bashdocker-compose up -d # 后台启动所有服务
http://localhost:9870http://localhost:8088hdfs dfs -ls /各镜像支持通过环境变量自定义配置,常用参数如下(具体以镜像版本为准):
| 环境变量 | 组件 | 说明 | 默认值示例 |
|---|---|---|---|
HADOOP_HOME | 所有组件 | Hadoop安装路径 | /opt/hadoop |
HDFS_NAMENODE_HOST | namenode | Namenode主机名 | namenode |
HDFS_NAMENODE_URI | datanode/client | Namenode访问地址 | hdfs://namenode:9000 |
HDFS_REPLICATION_FACTOR | namenode | HDFS默认副本数 | 3(测试环境建议设为1) |
YARN_RESOURCEMANAGER_HOST | resourcemanager/nodemanager | ResourceManager主机名 | resourcemanager |
MAPRED_HISTORY_SERVER_HOST | jobhistoryserver | JobHistoryServer主机名 | jobhistoryserver |
volumes持久化Namenode元数据和Datanode数据,避免容器删除导致数据丢失。docker run --memory或docker-compose的deploy.resources配置)。hadoop-net)确保集群组件通信隔离。hadoop/hadoop),生产环境需修改用户密码并限制SSH访问。您可以使用以下命令拉取该镜像。请将 <标签> 替换为具体的标签版本。如需查看所有可用标签版本,请访问 标签列表页面。
来自真实用户的反馈,见证轩辕镜像的优质服务