openeuler/sparkThe official Spark docker image.
Maintained by: openEuler CloudNative SIG.
Where to get help: openEuler CloudNative SIG, openEuler.
Current Spark docker images are built on the openEuler. This repository is free to use and exempted from per-user rate limits.
Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
Learn more on Spark website.
The tag of each spark docker image is consist of the version of spark and the version of basic image. The details are as follows
| Tags | Currently | Architectures |
|---|---|---|
| 4.0.1-oe2403sp2 | spark 4.0.1 on openEuler 24.03-LTS-SP2 | amd64, arm64 |
| 3.3.1-22.03-lts | spark 3.3.1 on openEuler 22.03-LTS | amd64, arm64 |
| 3.3.2-22.03-lts | spark 3.3.2 on openEuler 22.03-LTS | amd64, arm64 |
| 3.4.0-22.03-lts | spark 3.4.0 on openEuler 22.03-LTS | amd64, arm64 |
| 3.5.1-24.03-lts | spark 3.5.1 on openEuler 24.03-LTS | amd64, arm64 |
| 3.5.3-oe2003sp4 | spark 3.5.3 on openEuler 20.03-LTS-SP4 | amd64, arm64 |
| 3.5.3-oe2203sp1 | spark 3.5.3 on openEuler 22.03-LTS-SP1 | amd64, arm64 |
| 3.5.3-oe2203sp3 | spark 3.5.3 on openEuler 22.03-LTS-SP3 | amd64, arm64 |
| 3.5.3-oe2203sp4 | spark 3.5.3 on openEuler 22.03-LTS-SP4 | amd64, arm64 |
| 3.5.3-oe2403lts | spark 3.5.3 on openEuler 24.03-LTS | amd64, arm64 |
In this usage, users can select the corresponding {Tag} based on their requirements.
Online Documentation
You can find the latest Spark documentation, including a programming guide, on the project web page. This README file only contains basic setup instructions.
Pull the openeuler/spark image from docker
bashdocker pull openeuler/spark:{Tag}
Interactive Scala Shell
The easiest way to start using Spark is through the Scala shell:
bashdocker run -it --name spark openeuler/spark:{Tag} /opt/spark/bin/spark-shell
Try the following command, which should return 1,000,000,000:
bashscala> spark.range(1000 * 1000 * 1000).count()
Interactive Python Shell
The easiest way to start using PySpark is through the Python shell:
bashdocker run -it --name spark openeuler/spark:{Tag} /opt/spark/bin/pyspark
And run the following command, which should also return 1,000,000,000:
bash>>> spark.range(1000 * 1000 * 1000).count()
Running Spark on Kubernetes
[***]
Configuration and environment variables
See more in [***]
If you have any questions or want to use some special features, please submit an issue or a pull request on openeuler-docker-images.



manifest unknown 错误
TLS 证书验证失败
DNS 解析超时
410 错误:版本过低
402 错误:流量耗尽
身份认证失败错误
429 限流错误
凭证保存错误
来自真实用户的反馈,见证轩辕镜像的优质服务