You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Yikun Jiang <yi...@gmail.com> on 2022/11/10 10:27:44 UTC

Publish Apache Spark offcial image under the new rules?

Hi, all

Last month the vote of "Support Docker Official Image for Spark
<https://issues.apache.org/jira/browse/SPARK-40513>" passed.

# Progress of SPIP:

## Completed:
- A new github repo created: https://github.com/apache/spark-docker
- Add "Spark Docker
<https://issues.apache.org/jira/browse/SPARK-40969?jql=project%20%3D%20SPARK%20AND%20component%20%3D%20%22Spark%20Docker%22>"
component label in JIRA
- Uploaded 3.3.0/3.3.1 dockerfiles: spark-docker#2
<https://github.com/apache/spark-docker/pull/2> spark-docker#20
<https://github.com/apache/spark-docker/pull/20>
- Some fixes apply to dockerfiles to meet the DOI qualities requirements:
  * spark-docker#11 <https://github.com/apache/spark-docker/pull/11> Use
spark as username in official image (instead of magic number 185),
  * spark-docker#14 <https://github.com/apache/spark-docker/pull/14>  Cleanup
os download list cache to reduce image size.
  * spark-docker#17 <https://github.com/apache/spark-docker/pull/17> Remove
pip/setuptools dynamic upgrade to ensure image's repeatability
- Support dockerfile template to help generate all kinds of Dockerfiles for
specific version spark-docker#12
<https://github.com/apache/spark-docker/pull/12>
- Add workflow to help build/test dockerfile to ensure the Dockerfile's
quality
  * K8s Integration test spark-docker#9
<https://github.com/apache/spark-docker/pull/9>
  * Standalone test spark-docker#21
<https://github.com/apache/spark-docker/pull/21> (Great job by @dcoliversun)
- spark-website#424 <https://github.com/apache/spark-website/pull/424> Use
docker image in the example of SQL/Scala/Java
- INFRA-23882 <https://issues.apache.org/jira/browse/INFRA-23882> Add
Docker hub secrets to spark-docker repo to help publish docker hub image

## Not merged yet:
- spark-docker#23 <https://github.com/apache/spark-docker/pull/23> One
click to publish "apache/spark" image
  instead of the current Spark Docker Images publish step
<https://github.com/wangyum/spark-website/blob/1c6b2ee13a1e22748ed416c5cc260c33795a76c8/release-process.md#create-and-upload-spark-docker-images>.
It will also run K8s IT /standalone test first then publish.
- docker-library/official-images#13089
<https://github.com/docker-library/official-images/pull/13089> Add Apache
Spark Docker Official Image,
  waiting for review from docker side.

After the above work, I think we almost reached the quality of DOI (might
have some small fix according to docker
side review in future maybe), but limited by the docker side review
bandwith. The good news is that the PR are in
the top of the review queue according to review history.


# Next step?

Should we publish the apache/spark image (3.3.0/3.3.1) according to
new rules now?

After publish, the apache/spark will add several new tags for v3.3.0 and
v3.3.1 like:

- apache/spark:python3
- apache/spark:scala
- apache/spark:r
- apache/spark all in one
* You can see the complete tag info in here
<https://github.com/apache/spark-docker/pull/23/files#diff-2b39d33506bc7a34cef4b9ebf4cf8b1e3a5532f2131ceb37011b94261cec5f8c>
.

WDYT?

Regards,
Yikun

Re: Publish Apache Spark offcial image under the new rules?

Posted by Yikun Jiang <yi...@gmail.com>.
BTW, you might want to try the new image, I publish them in my local
ghcr/docker hub, you could try:

- Try spark shell / pyspark / sparkr
docker run -ti ghcr.io/yikun/spark-docker/spark /opt/spark/bin/spark-shell
docker run -ti ghcr.io/yikun/spark-docker/spark /opt/spark/bin/pyspark
docker run -ti ghcr.io/yikun/spark-docker/spark:r /opt/spark/bin/sparkR

- Try standalone mode like this
<https://github.com/Yikun/spark-docker/blob/52152c1b6d70acc2e7c5e32bffe0265b55df7b6f/.github/workflows/main.yml#L113>

- Try them in K8s with local minikube like this
<https://github.com/Yikun/spark-docker/blob/master/.github/workflows/main.yml#L161-L216>

- All available image tags in here
<https://github.com/Yikun/spark-docker/pkgs/container/spark-docker%2Fspark/versions?filters%5Bversion_type%5D=tagged>
(ghcr)
or here
<https://hub.docker.com/repository/registry-1.docker.io/yikunkero/spark/tags?page=1&ordering=last_updated>
(docker hub) .

Regards,
Yikun


On Thu, Nov 10, 2022 at 6:27 PM Yikun Jiang <yi...@gmail.com> wrote:

> Hi, all
>
> Last month the vote of "Support Docker Official Image for Spark
> <https://issues.apache.org/jira/browse/SPARK-40513>" passed.
>
> # Progress of SPIP:
>
> ## Completed:
> - A new github repo created: https://github.com/apache/spark-docker
> - Add "Spark Docker
> <https://issues.apache.org/jira/browse/SPARK-40969?jql=project%20%3D%20SPARK%20AND%20component%20%3D%20%22Spark%20Docker%22>"
> component label in JIRA
> - Uploaded 3.3.0/3.3.1 dockerfiles: spark-docker#2
> <https://github.com/apache/spark-docker/pull/2> spark-docker#20
> <https://github.com/apache/spark-docker/pull/20>
> - Some fixes apply to dockerfiles to meet the DOI qualities requirements:
>   * spark-docker#11 <https://github.com/apache/spark-docker/pull/11> Use
> spark as username in official image (instead of magic number 185),
>   * spark-docker#14 <https://github.com/apache/spark-docker/pull/14>  Cleanup
> os download list cache to reduce image size.
>   * spark-docker#17 <https://github.com/apache/spark-docker/pull/17> Remove
> pip/setuptools dynamic upgrade to ensure image's repeatability
> - Support dockerfile template to help generate all kinds of Dockerfiles
> for specific version spark-docker#12
> <https://github.com/apache/spark-docker/pull/12>
> - Add workflow to help build/test dockerfile to ensure the Dockerfile's
> quality
>   * K8s Integration test spark-docker#9
> <https://github.com/apache/spark-docker/pull/9>
>   * Standalone test spark-docker#21
> <https://github.com/apache/spark-docker/pull/21> (Great job by
> @dcoliversun)
> - spark-website#424 <https://github.com/apache/spark-website/pull/424> Use
> docker image in the example of SQL/Scala/Java
> - INFRA-23882 <https://issues.apache.org/jira/browse/INFRA-23882> Add
> Docker hub secrets to spark-docker repo to help publish docker hub image
>
> ## Not merged yet:
> - spark-docker#23 <https://github.com/apache/spark-docker/pull/23> One
> click to publish "apache/spark" image
>   instead of the current Spark Docker Images publish step
> <https://github.com/wangyum/spark-website/blob/1c6b2ee13a1e22748ed416c5cc260c33795a76c8/release-process.md#create-and-upload-spark-docker-images>.
> It will also run K8s IT /standalone test first then publish.
> - docker-library/official-images#13089
> <https://github.com/docker-library/official-images/pull/13089> Add Apache
> Spark Docker Official Image,
>   waiting for review from docker side.
>
> After the above work, I think we almost reached the quality of DOI (might
> have some small fix according to docker
> side review in future maybe), but limited by the docker side review
> bandwith. The good news is that the PR are in
> the top of the review queue according to review history.
>
>
> # Next step?
>
> Should we publish the apache/spark image (3.3.0/3.3.1) according to
> new rules now?
>
> After publish, the apache/spark will add several new tags for v3.3.0 and
> v3.3.1 like:
>
> - apache/spark:python3
> - apache/spark:scala
> - apache/spark:r
> - apache/spark all in one
> * You can see the complete tag info in here
> <https://github.com/apache/spark-docker/pull/23/files#diff-2b39d33506bc7a34cef4b9ebf4cf8b1e3a5532f2131ceb37011b94261cec5f8c>
> .
>
> WDYT?
>
> Regards,
> Yikun
>