You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2018/08/28 13:13:00 UTC

[jira] [Created] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes

Rob Vesse created SPARK-25262:
---------------------------------

             Summary: Make Spark local dir volumes configurable with Spark on Kubernetes
                 Key: SPARK-25262
                 URL: https://issues.apache.org/jira/browse/SPARK-25262
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes
    Affects Versions: 2.3.1, 2.3.0
            Reporter: Rob Vesse


As discussed during review of the design document for SPARK-24434 while providing pod templates will provide more in-depth customisation for Spark on Kubernetes there are some things that cannot be modified because Spark code generates pod specs in very specific ways.

The particular issue identified relates to handling on {{spark.local.dirs}} which is done by {{LocalDirsFeatureStep.scala}}.  For each directory specified, or a single default if no explicit specification, it creates a Kubernetes {{emptyDir}} volume.  As noted in the Kubernetes documentation this will be backed by the node storage (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).  In some compute environments this may be extremely undesirable.  For example with diskless compute resources the node storage will likely be a non-performant remote mounted disk, often with limited capacity.  For such environments it would likely be better to set {{medium: Memory}} on the volume per the K8S documentation to use a {{tmpfs}} volume instead.

Another closely related issue is that users might want to use a different volume type to back the local directories and there is no possibility to do that.

Pod templates will not really solve either of these issues because Spark is always going to attempt to generate a new volume for each local directory and always going to set these as {{emptyDir}}.

Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:

* Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} volumes
* Modify the logic to check if there is a volume already defined with the name and if so skip generating a volume definition for it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org