You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2019/02/15 20:49:00 UTC

[jira] [Resolved] (SPARK-26773) Consider alternative base images for Kubernetes

     [ https://issues.apache.org/jira/browse/SPARK-26773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo Vanzin resolved SPARK-26773.
------------------------------------
    Resolution: Duplicate

I'm consolidating all docker image-related bugs under SPARK-24655. All discussion about requirements that people have around docker images should go there.

> Consider alternative base images for Kubernetes
> -----------------------------------------------
>
>                 Key: SPARK-26773
>                 URL: https://issues.apache.org/jira/browse/SPARK-26773
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, PySpark
>    Affects Versions: 2.4.0
>            Reporter: Ondrej Kokes
>            Priority: Minor
>
> I understand the desire to make the base image (not just) for Kubernetes to be minimal and thus the choice of Alpine, but that distro has its limitations. The main one being musl as its libc implementation.
> The main reason for us not to use Alpine for our non-Spark workloads is that we're using Python and *we cannot use pre-built distributions of packages (so-called wheels)*, because they are usually built for glibc-based distros (work is being done for musl-based builds, but we're not there yet [0]).
> So instead of popular packages like numpy or pandas being installed in seconds, a build process has to be initiated upon each installation of many packages (and that requires a compiler etc.). We could theoretically build all these packages into the base image, but that would require multi-step builds, so that we don't include gcc/clang in the final image, having to rebuild the docker image with each dependency change etc.
> There have already been similar issues submitted [1].
> *I'm not sure what the best course of action is.* If there should be a e.g. debian-based distro as an alternative. Or perhaps there could be a good reason for a glibc-based distro to be the default Docker base image, with an option to "downgrade" to Alpine. (I'm guessing that R, with its popular Rcpp-based extensions, might suffer from a similar problem, but I'm mostly guessing. [2])
>  
> [0] https://www.python.org/dev/peps/pep-0513/
> [1] https://github.com/apache-spark-on-k8s/spark/issues/326
> [2] https://github.com/rocker-org/rocker/issues/231#issuecomment-297150217



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org