
Hadoop Sandbox Docker镜像是https://github.com/hadoop-sandbox%E9%A1%B9%E7%9B%AE%E7%9A%84%E5%AE%B9%E5%99%A8%E5%8C%96%E5%AE%9E%E7%8E%B0%EF%BC%8C%E6%8F%90%E4%BE%9BHadoop%E9%9B%86%E7%BE%A4%E6%A0%B8%E5%BF%83%E7%BB%84%E4%BB%B6%E7%9A%84%E5%AE%B9%E5%99%A8%E5%8C%96%E9%83%A8%E7%BD%B2%E6%96%B9%E6%A1%88%E3%80%82%E8%AF%A5%E9%95%9C%E5%83%8F%E9%9B%86%E5%90%88%E5%B0%86Hadoop%E7%94%9F%E6%80%81%E7%B3%BB%E7%BB%9F%E7%9A%84%E5%85%B3%E9%94%AE%E7%BB%84%E4%BB%B6%EF%BC%88HDFS%E3%80%81YARN%E3%80%81MapReduce%E7%AD%89%EF%BC%89%E6%8B%86%E5%88%86%E4%B8%BA%E7%8B%AC%E7%AB%8B%E9%95%9C%E5%83%8F%EF%BC%8C%E6%97%A8%E5%9C%A8%E7%AE%80%E5%8C%96Hadoop%E7%8E%AF%E5%A2%83%E7%9A%84%E6%90%AD%E5%BB%BA%E3%80%81%E9%85%8D%E7%BD%AE%E5%92%8C%E7%AE%A1%E7%90%86%EF%BC%8C%E9%80%82%E7%94%A8%E4%BA%8EHadoop%E5%BC%80%E5%8F%91%E3%80%81%E6%B5%8B%E8%AF%95%E3%80%81%E5%AD%A6%E4%B9%A0%E5%8F%8A%E5%B0%8F%E5%9E%8B%E6%BC%94%E7%A4%BA%E5%9C%BA%E6%99%AF%E3%80%82
Hadoop Sandbox镜像包含以下核心组件,各组件功能如下:
hdfs、yarn、mapred等)。hadoop),确保运行环境隔离与安全。假设镜像托管于Docker Hub(实际需替换为项目官方仓库地址),拉取命令如下:
bash# 拉取基础镜像 docker pull hadoop-sandbox/hadoop-base:latest # 拉取客户端镜像 docker pull hadoop-sandbox/hadoop-client:latest # 拉取HDFS组件镜像 docker pull hadoop-sandbox/hadoop-hdfs-namenode:latest docker pull hadoop-sandbox/hadoop-hdfs-datanode:latest # 拉取YARN组件镜像 docker pull hadoop-sandbox/hadoop-yarn-resourcemanager:latest docker pull hadoop-sandbox/hadoop-yarn-nodemanager:latest # 拉取MapReduce作业历史服务器镜像 docker pull hadoop-sandbox/hadoop-mapred-jobhistoryserver:latest
以下为独立运行hadoop-client(客户端节点)的示例,用于通过SSH连接集群:
bashdocker run -d \ --name hadoop-client \ -p 2222:22 \ # 映射SSH端口到主机2222 --network hadoop-net \ # 建议使用自定义网络隔离集群 hadoop-sandbox/hadoop-client:latest
运行后,通过SSH连接客户端节点:
bashssh -p 2222 hadoop@localhost # 默认用户:hadoop,密码:hadoop(需参考镜像实际配置)
Hadoop集群需多组件协同工作,推荐使用docker-compose编排部署。以下为最小化集群配置示例(docker-compose.yml):
yamlversion: '3.8' networks: hadoop-net: driver: bridge volumes: hdfs-namenode-data: # 持久化Namenode元数据 hdfs-datanode-data: # 持久化Datanode数据块 services: # HDFS Namenode namenode: image: hadoop-sandbox/hadoop-hdfs-namenode:latest container_name: hadoop-namenode networks: - hadoop-net volumes: - hdfs-namenode-data:/hadoop/dfs/name environment: - HDFS_NAMENODE_HOST=namenode # 容器内主机名(与service名一致) - HDFS_REPLICATION_FACTOR=1 # 测试环境副本数设为1 ports: - "9870:9870" # HDFS WebUI端口 restart: unless-stopped # HDFS Datanode datanode: image: hadoop-sandbox/hadoop-hdfs-datanode:latest container_name: hadoop-datanode networks: - hadoop-net volumes: - hdfs-datanode-data:/hadoop/dfs/data environment: - HDFS_NAMENODE_URI=hdfs://namenode:9000 # 连接Namenode地址 depends_on: - namenode restart: unless-stopped # YARN ResourceManager resourcemanager: image: hadoop-sandbox/hadoop-yarn-resourcemanager:latest container_name: hadoop-resourcemanager networks: - hadoop-net ports: - "8088:8088" # YARN WebUI端口 environment: - YARN_RESOURCEMANAGER_HOST=resourcemanager depends_on: - namenode restart: unless-stopped # YARN NodeManager nodemanager: image: hadoop-sandbox/hadoop-yarn-nodemanager:latest container_name: hadoop-nodemanager networks: - hadoop-net environment: - YARN_RESOURCEMANAGER_HOST=resourcemanager - YARN_NODEMANAGER_HOST=nodemanager depends_on: - resourcemanager restart: unless-stopped # MapReduce JobHistoryServer jobhistoryserver: image: hadoop-sandbox/hadoop-mapred-jobhistoryserver:latest container_name: hadoop-jobhistoryserver networks: - hadoop-net ports: - "19888:19888" # JobHistory WebUI端口 environment: - MAPRED_HISTORY_SERVER_HOST=jobhistoryserver depends_on: - resourcemanager restart: unless-stopped # Hadoop Client(SSH访问) client: image: hadoop-sandbox/hadoop-client:latest container_name: hadoop-client networks: - hadoop-net ports: - "2222:22" # SSH端口映射 depends_on: - namenode - resourcemanager restart: unless-stopped
部署步骤:
docker-compose.yml文件,复制上述配置。bashdocker-compose up -d # 后台启动所有服务
http://localhost:9870http://localhost:8088hdfs dfs -ls /各镜像支持通过环境变量自定义配置,常用参数如下(具体以镜像版本为准):
| 环境变量 | 组件 | 说明 | 默认值示例 |
|---|---|---|---|
HADOOP_HOME | 所有组件 | Hadoop安装路径 | /opt/hadoop |
HDFS_NAMENODE_HOST | namenode | Namenode主机名 | namenode |
HDFS_NAMENODE_URI | datanode/client | Namenode访问地址 | hdfs://namenode:9000 |
HDFS_REPLICATION_FACTOR | namenode | HDFS默认副本数 | 3(测试环境建议设为1) |
YARN_RESOURCEMANAGER_HOST | resourcemanager/nodemanager | ResourceManager主机名 | resourcemanager |
MAPRED_HISTORY_SERVER_HOST | jobhistoryserver | JobHistoryServer主机名 | jobhistoryserver |
volumes持久化Namenode元数据和Datanode数据,避免容器删除导致数据丢失。docker run --memory或docker-compose的deploy.resources配置)。hadoop-net)确保集群组件通信隔离。hadoop/hadoop),生产环境需修改用户密码并限制SSH访问。您可以使用以下命令拉取该镜像。请将 <标签> 替换为具体的标签版本。如需查看所有可用标签版本,请访问 标签列表页面。






探索更多轩辕镜像的使用方法,找到最适合您系统的配置方式
通过 Docker 登录认证访问私有仓库
无需登录使用专属域名
Kubernetes 集群配置 Containerd
K3s 轻量级 Kubernetes 镜像加速
VS Code Dev Containers 配置
Podman 容器引擎配置
HPC 科学计算容器配置
ghcr、Quay、nvcr 等镜像仓库
Harbor Proxy Repository 对接专属域名
Portainer Registries 加速拉取
Nexus3 Docker Proxy 内网缓存
需要其他帮助?请查看我们的 常见问题Docker 镜像访问常见问题解答 或 提交工单
docker search 限制
站内搜不到镜像
离线 save/load
插件要用 plugin install
WSL 拉取慢
安全与 digest
新手拉取配置
镜像合规机制
manifest unknown
no matching manifest(架构)
invalid tar header(解压)
TLS 证书失败
DNS 超时
域名连通性排查
410 Gone 排查
402 与流量用尽
401 认证失败
429 限流
D-Bus 凭证提示
413 与超大单层
来自真实用户的反馈,见证轩辕镜像的优质服务