You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ondrej Kokes (JIRA)" <ji...@apache.org> on 2019/02/16 14:12:00 UTC

[jira] [Commented] (SPARK-24655) [K8S] Custom Docker Image Expectations and Documentation

    [ https://issues.apache.org/jira/browse/SPARK-24655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16770115#comment-16770115 ] 

Ondrej Kokes commented on SPARK-24655:
--------------------------------------

I would expect there to be a Dockerfile that would accept my requirements.txt/Pipfile (in case of PySpark) and install everything in there using standard command, so that I wouldn't have to do anything other than docker build. And as I noted in the duplicate issue, the base distro would need to be glibc-based.

The only deviation from this workflow would be if I needed to add anything extra into the image, say custom certificates, inject environment variables, or custom repositories (though some of this could be handled by Kubernetes itself). But at least I'd have a starting point - a Dockerfile.

Slightly off topic: you mention build-time dependencies, but there could be cases where we'd need to install stuff at runtime - e.g. in a Zeppelin/Jupyter scenario. Not sure if that affects this in any way, but it's a workflow that should be supported as well.

> [K8S] Custom Docker Image Expectations and Documentation
> --------------------------------------------------------
>
>                 Key: SPARK-24655
>                 URL: https://issues.apache.org/jira/browse/SPARK-24655
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 2.3.1
>            Reporter: Matt Cheah
>            Priority: Major
>
> A common use case we want to support with Kubernetes is the usage of custom Docker images. Some examples include:
>  * A user builds an application using Gradle or Maven, using Spark as a compile-time dependency. The application's jars (both the custom-written jars and the dependencies) need to be packaged in a docker image that can be run via spark-submit.
>  * A user builds a PySpark or R application and desires to include custom dependencies
>  * A user wants to switch the base image from Alpine to CentOS while using either built-in or custom jars
> We currently do not document how these custom Docker images are supposed to be built, nor do we guarantee stability of these Docker images with various spark-submit versions. To illustrate how this can break down, suppose for example we decide to change the names of environment variables that denote the driver/executor extra JVM options specified by {{spark.[driver|executor].extraJavaOptions}}. If we change the environment variable spark-submit provides then the user must update their custom Dockerfile and build new images.
> Rather than jumping to an implementation immediately though, it's worth taking a step back and considering these matters from the perspective of the end user. Towards that end, this ticket will serve as a forum where we can answer at least the following questions, and any others pertaining to the matter:
>  # What would be the steps a user would need to take to build a custom Docker image, given their desire to customize the dependencies and the content (OS or otherwise) of said images?
>  # How can we ensure the user does not need to rebuild the image if only the spark-submit version changes?
> The end deliverable for this ticket is a design document, and then we'll create sub-issues for the technical implementation and documentation of the contract.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org