arm64v8/spark 是针对ARM64架构优化的Apache Spark官方Docker镜像,基于https://hub.docker.com/_/spark%E7%9A%84%60arm64v8%60%E6%9E%B6%E6%9E%84%E6%9E%84%E5%BB%BA%E7%89%88%E6%9C%AC%E3%80%82Apache Spark是一个统一的大数据分析引擎,支持大规模数据处理、数据科学和机器学习任务,提供多语言API(Scala、Java、Python、R)及分布式计算能力。该镜像适用于在ARM64架构环境中快速部署和运行Spark应用,简化大数据处理流程。
spark-shell、pyspark、sparkR),便于实时数据分析和调试| 标签 | 对应的Dockerfile链接 |
|---|---|
4.0.0-scala2.13-java21-python3-ubuntu, 4.0.0-java21-python3, 4.0.0-java21, python3, latest | https://github.com/apache/spark-docker/blob/4bd1dbce94797b5b387b784db6b378069a8b6328/4.0.0/scala2.13-java21-python3-ubuntu/Dockerfile |
4.0.0-scala2.13-java21-r-ubuntu, 4.0.0-java21-r | https://github.com/apache/spark-docker/blob/4bd1dbce94797b5b387b784db6b378069a8b6328/4.0.0/scala2.13-java21-r-ubuntu/Dockerfile |
4.0.0-scala2.13-java21-ubuntu, 4.0.0-java21-scala | https://github.com/apache/spark-docker/blob/4bd1dbce94797b5b387b784db6b378069a8b6328/4.0.0/scala2.13-java21-ubuntu/Dockerfile |
4.0.0-scala2.13-java21-python3-r-ubuntu | https://github.com/apache/spark-docker/blob/4bd1dbce94797b5b387b784db6b378069a8b6328/4.0.0/scala2.13-java21-python3-r-ubuntu/Dockerfile |
4.0.0-scala2.13-java17-python3-ubuntu, 4.0.0-python3, 4.0.0, python3-java17 | https://github.com/apache/spark-docker/blob/4bd1dbce94797b5b387b784db6b378069a8b6328/4.0.0/scala2.13-java17-python3-ubuntu/Dockerfile |
4.0.0-scala2.13-java17-r-ubuntu, 4.0.0-r, r | https://github.com/apache/spark-docker/blob/4bd1dbce94797b5b387b784db6b378069a8b6328/4.0.0/scala2.13-java17-r-ubuntu/Dockerfile |
4.0.0-scala2.13-java17-ubuntu, 4.0.0-scala, scala | https://github.com/apache/spark-docker/blob/4bd1dbce94797b5b387b784db6b378069a8b6328/4.0.0/scala2.13-java17-ubuntu/Dockerfile |
4.0.0-scala2.13-java17-python3-r-ubuntu | https://github.com/apache/spark-docker/blob/4bd1dbce94797b5b387b784db6b378069a8b6328/4.0.0/scala2.13-java17-python3-r-ubuntu/Dockerfile |
3.5.7-scala2.12-java17-python3-ubuntu, 3.5.7-java17-python3, 3.5.7-java17 | https://github.com/apache/spark-docker/blob/2ebf694ad45fee6f4beeeb4204bcdb01d73c988f/3.5.7/scala2.12-java17-python3-ubuntu/Dockerfile |
3.5.7-scala2.12-java17-r-ubuntu, 3.5.7-java17-r | https://github.com/apache/spark-docker/blob/2ebf694ad45fee6f4beeeb4204bcdb01d73c988f/3.5.7/scala2.12-java17-r-ubuntu/Dockerfile |
3.5.7-scala2.12-java17-ubuntu, 3.5.7-java17-scala | https://github.com/apache/spark-docker/blob/2ebf694ad45fee6f4beeeb4204bcdb01d73c988f/3.5.7/scala2.12-java17-ubuntu/Dockerfile |
3.5.7-scala2.12-java17-python3-r-ubuntu | https://github.com/apache/spark-docker/blob/2ebf694ad45fee6f4beeeb4204bcdb01d73c988f/3.5.7/scala2.12-java17-python3-r-ubuntu/Dockerfile |
3.5.7-scala2.12-java11-python3-ubuntu, 3.5.7-python3, 3.5.7 | https://github.com/apache/spark-docker/blob/2ebf694ad45fee6f4beeeb4204bcdb01d73c988f/3.5.7/scala2.12-java11-python3-ubuntu/Dockerfile |
3.5.7-scala2.12-java11-r-ubuntu, 3.5.7-r | https://github.com/apache/spark-docker/blob/2ebf694ad45fee6f4beeeb4204bcdb01d73c988f/3.5.7/scala2.12-java11-r-ubuntu/Dockerfile |
3.5.7-scala2.12-java11-ubuntu, 3.5.7-scala | https://github.com/apache/spark-docker/blob/2ebf694ad45fee6f4beeeb4204bcdb01d73c988f/3.5.7/scala2.12-java11-ubuntu/Dockerfile |
3.5.7-scala2.12-java11-python3-r-ubuntu | https://github.com/apache/spark-docker/blob/2ebf694ad45fee6f4beeeb4204bcdb01d73c988f/3.5.7/scala2.12-java11-python3-r-ubuntu/Dockerfile |
Scala Shell
通过Scala Shell快速开始Spark交互:
bashdocker run -it arm64v8/spark /opt/spark/bin/spark-shell
示例命令(返回1,000,000,000):
scalascala> spark.range(1000 * 1000 * 1000).count()
Python Shell(PySpark)
使用Python Shell需指定python3标签:
bashdocker run -it arm64v8/spark:python3 /opt/spark/bin/pyspark
示例命令:
python>>> spark.range(1000 * 1000 * 1000).count()
R Shell(SparkR)
使用R Shell需指定r标签:
bashdocker run -it arm64v8/spark:r /opt/spark/bin/sparkR
单节点Spark集群(docker run)
启动Spark Master节点:
bashdocker run -d \ --name spark-master \ -p 7077:7077 \ -p 8080:8080 \ arm64v8/spark \ /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master
启动Spark Worker节点(连接到Master):
bashdocker run -d \ --name spark-worker \ --link spark-master:master \ arm64v8/spark \ /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077
Docker Compose配置(单节点集群)
创建docker-compose.yml:
yamlversion: '3' services: master: image: arm64v8/spark container_name: spark-master ports: - "7077:7077" # Master节点端口 - "8080:8080" # Web UI端口 command: /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master worker: image: arm64v8/spark container_name: spark-worker depends_on: - master environment: - SPARK_MASTER=spark://master:7077 command: /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker ${SPARK_MASTER}
启动集群:
bashdocker-compose up -d
Spark支持在Kubernetes上部署,详细文档参见官方指南。
镜像支持通过环境变量自定义Spark配置,常见配置项及说明参见https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable%E3%80%82%E5%85%B3%E9%94%AE%E7%8E%AF%E5%A2%83%E5%8F%98%E9%87%8F%E5%8C%85%E6%8B%AC%EF%BC%9A
SPARK_HOME:Spark安装路径(默认/opt/spark)SPARK_MASTER:Master节点地址(如spark://master:7077)SPARK_WORKER_CORES:Worker节点可用CPU核心数SPARK_WORKER_MEMORY:Worker节点可用内存(如4g)Apache Spark及其Docker镜像基于Apache License 2.0许可。镜像可能包含其他软件(如基础系统工具),其许可需由用户自行确认合规性。
更多许可信息参见https://github.com/docker-library/repo-info/tree/master/repos/spark%E3%80%82
以下是 arm64v8/spark 相关的常用 Docker 镜像,适用于 不同场景 等不同场景:
您可以使用以下命令拉取该镜像。请将 <标签> 替换为具体的标签版本。如需查看所有可用标签版本,请访问 标签列表页面。


探索更多轩辕镜像的使用方法,找到最适合您系统的配置方式
通过 Docker 登录认证访问私有仓库
无需登录使用专属域名
Kubernetes 集群配置 Containerd
K3s 轻量级 Kubernetes 镜像加速
VS Code Dev Containers 配置
Podman 容器引擎配置
HPC 科学计算容器配置
ghcr、Quay、nvcr 等镜像仓库
Harbor Proxy Repository 对接专属域名
Portainer Registries 加速拉取
Nexus3 Docker Proxy 内网缓存
需要其他帮助?请查看我们的 常见问题Docker 镜像访问常见问题解答 或 提交工单
docker search 限制
站内搜不到镜像
离线 save/load
插件要用 plugin install
WSL 拉取慢
安全与 digest
新手拉取配置
镜像合规机制
manifest unknown
no matching manifest(架构)
invalid tar header(解压)
TLS 证书失败
DNS 超时
域名连通性排查
410 Gone 排查
402 与流量用尽
401 认证失败
429 限流
D-Bus 凭证提示
413 与超大单层
来自真实用户的反馈,见证轩辕镜像的优质服务