You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2019/04/18 16:19:00 UTC

[jira] [Commented] (SPARK-24655) [K8S] Custom Docker Image Expectations and Documentation

    [ https://issues.apache.org/jira/browse/SPARK-24655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821270#comment-16821270 ] 

Thomas Graves commented on SPARK-24655:
---------------------------------------

From the linked issues it seems the goals would be:
 * support more then alpine image base - ie a glibc version
 * Allow for adding at least certain support like GPUs - although this may just making base image configurable
 * Allow for overriding the start commands for things like using jupyter docker images.
 * add in python pip requirements, and I assume would be nice for R, is there something generic we can do to make this easy

Correct me if I'm wrong but anything spark related you should be able to use SPARK confs for, like env variables. like {{spark.kubernetes.driverEnv.[EnvironmentVariableName]}} and spark.executorEnv..  Otherwise you could just use the dockerfile built here as a base and build on it. 

I think we would just want to try to make it easy for the common cases and allow users to override things we may have hardcoded to allow them to reuse it as a base.

[~mcheah] From the original description, why do we want to try to not rebuild the image if spark version changes? It seems ok to allow them to override to point to their own spark version (which they could then use to do this), but I would think normally you would build a new docker image for a new version of spark? Dependencies may have changed, the docker template may have changed, etc..  It seems if they really wanted this, they would just specify their own docker image as a base and just add the spark pieces, is that what you are getting at?  We can make the base image a argument to the docker-image-tool.sh script

> [K8S] Custom Docker Image Expectations and Documentation
> --------------------------------------------------------
>
>                 Key: SPARK-24655
>                 URL: https://issues.apache.org/jira/browse/SPARK-24655
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 2.3.1
>            Reporter: Matt Cheah
>            Priority: Major
>
> A common use case we want to support with Kubernetes is the usage of custom Docker images. Some examples include:
>  * A user builds an application using Gradle or Maven, using Spark as a compile-time dependency. The application's jars (both the custom-written jars and the dependencies) need to be packaged in a docker image that can be run via spark-submit.
>  * A user builds a PySpark or R application and desires to include custom dependencies
>  * A user wants to switch the base image from Alpine to CentOS while using either built-in or custom jars
> We currently do not document how these custom Docker images are supposed to be built, nor do we guarantee stability of these Docker images with various spark-submit versions. To illustrate how this can break down, suppose for example we decide to change the names of environment variables that denote the driver/executor extra JVM options specified by {{spark.[driver|executor].extraJavaOptions}}. If we change the environment variable spark-submit provides then the user must update their custom Dockerfile and build new images.
> Rather than jumping to an implementation immediately though, it's worth taking a step back and considering these matters from the perspective of the end user. Towards that end, this ticket will serve as a forum where we can answer at least the following questions, and any others pertaining to the matter:
>  # What would be the steps a user would need to take to build a custom Docker image, given their desire to customize the dependencies and the content (OS or otherwise) of said images?
>  # How can we ensure the user does not need to rebuild the image if only the spark-submit version changes?
> The end deliverable for this ticket is a design document, and then we'll create sub-issues for the technical implementation and documentation of the contract.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org