You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stephen Hopper (Jira)" <ji...@apache.org> on 2020/07/03 13:29:00 UTC
[jira] [Reopened] (SPARK-31666) Cannot map hostPath volumes to container

     [ https://issues.apache.org/jira/browse/SPARK-31666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Hopper reopened SPARK-31666:
------------------------------------

Hi [~dongjoon],

This is still a bug. I should clarify the issue a bit more.

In Spark 2.4, the `LocalDirsFeatureStep` iterates through the list of paths in `spark.local.dir`. For each one, it creates a Kubernetes volume of mount type `emptyDir` with the name `spark-local-dir-${index}`.

In Spark 3.0, the `LocalDirsFeatureStep` checks for any volume mounts with the prefix `spark-local-dir-`. If none exist, it iterates through the list of paths in `spark.local.dir` and creates a Kubernetes volume of mount type `emptyDir` with the name `spark-local-dir-${index}`.

 

The issue is that I need my Spark job to use paths from my host machine that are on a mount point that isn't part of the directory which Kubernetes uses to allocate space for `emptyDir` volumes. Therefore, I mount these paths as type `hostPath` and ask Spark to use them as local directory space.

 

The error regarding the path already being mounted is happening because the path has already been mounted of my own accord; Spark need not attempt to add another volume mapping for something I've already done. Hence, this is a bug which can be resolved by simply backporting SPARK-28042. That issue is entangled a bit with changes from SPARK-25262. However, the changes for supporting tmpfs in SPARK-25262 are not required to fix this. While it would be easiest to just backport both fixes, I respect your desire to only backport fixes and avoid backporting features, so I will open a PR that just includes SPARK-28042. How does this sound to you?

 

On a separate note, I realize that Spark 2.4 has been out for over 18 months and the policy states that it will only be supported for 18 months. However, given the gap between the release of Spark 2.4 and the official release of Spark 3.0 exceeded 18 months and given that the fix for the issue I'm currently experiencing was merged into Spark 3.0 a full year before Spark 3.0 was released and wasn't made available to folks still using Spark 2.4 and experiencing the issue, I feel the policy should be revised to "minor release will be supported for 18 months or 6 months from the date of the release of their successor, whichever is later". I feel giving folks 6 months to migrate from one Spark release to the next is fair, especially now considering how mature Spark is as a project. What are your thoughts on this?

> Cannot map hostPath volumes to container
> ----------------------------------------
>
>                 Key: SPARK-31666
>                 URL: https://issues.apache.org/jira/browse/SPARK-31666
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Spark Core
>    Affects Versions: 2.4.5
>            Reporter: Stephen Hopper
>            Priority: Major
>
> I'm trying to mount additional hostPath directories as seen in a couple of places:
> [https://aws.amazon.com/blogs/containers/optimizing-spark-performance-on-kubernetes/]
> [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#using-volume-for-scratch-space]
> [https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes]
>  
> However, whenever I try to submit my job, I run into this error:
> {code:java}
> Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1 │
>  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default.svc/api/v1/namespaces/my-spark-ns/pods. Message: Pod "spark-pi-1588970477877-exec-1" is invalid: spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be unique. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.containers[0].volumeMounts[1].mountPath, message=Invalid value: "/tmp1": must be unique, reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, name=spark-pi-1588970477877-exec-1, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod "spark-pi-1588970477877-exec-1" is invalid: spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be unique, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).{code}
>  
> This is my spark-submit command (note: I've used my own build of spark for kubernetes as well as a few other images that I've seen floating around (such as this one seedjeffwan/spark:v2.4.5) and they all have this same issue):
> {code:java}
> bin/spark-submit \
>  --master k8s://https://my-k8s-server:443 \
>  --deploy-mode cluster \
>  --name spark-pi \
>  --class org.apache.spark.examples.SparkPi \
>  --conf spark.executor.instances=2 \
>  --conf spark.kubernetes.container.image=my-spark-image:my-tag \
>  --conf spark.kubernetes.driver.pod.name=sparkpi-test-driver \
>  --conf spark.kubernetes.namespace=my-spark-ns \
>  --conf spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.mount.path=/tmp1 \
>  --conf spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.options.path=/tmp1 \
>  --conf spark.local.dir="/tmp1" \
>  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
>  local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 20000{code}
> Any ideas on what's causing this?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org