You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "lione Herbet (Jira)" <ji...@apache.org> on 2023/03/17 11:20:00 UTC

[jira] [Created] (SPARK-42837) spark-submit - issue when resolving dependencies hosted on a private repository in kubernetes cluster mode

lione Herbet created SPARK-42837:
------------------------------------

             Summary: spark-submit - issue when resolving dependencies hosted on a private repository in kubernetes cluster mode
                 Key: SPARK-42837
                 URL: https://issues.apache.org/jira/browse/SPARK-42837
             Project: Spark
          Issue Type: Bug
          Components: Spark Submit
    Affects Versions: 3.3.2
            Reporter: lione Herbet


When using [spark operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator], if dependencies are hosted on a private repository with authentication needed (like S3 or OCI) the spark operator submitting the job need to have all the secrets to access all dependencies. If not the spark-submit fails.

On a multi tenant kubernetes cluster where the spark operator and spark jobs execution are on seperate namespaces, it involves duplicating all secrets or it won't work.

It seems that spark-submit need to acces dependencies (with credentials) only to resolveGlobPath ([https://github.com/apache/spark/blob/v3.3.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L364-L367)] . It seems to me (but need to be confirmed by someone more skilled than me on spark internals behavior) that this resolveGlobPath task is also done when the driver is downloading the jars.

Would it be possible to have this resolveGlobPath task skipped when running on a  Kubernetes Cluster in cluster mode ?

For example add a condition like this arround the 364-367 lines :
{code:java}
if (isKubernetesCluster) {
...
} {code}
We could even, for compatibility reason with old behavior if needed, add also a condition on a spark parameter like this :
{code:java}
if (isKubernetesCluster && sparkConf.getBoolean("spark.kubernetes.resolevGlobPathsInSubmit", true)) { 
...
}{code}
i tested both solution locally and it seems to resolve the case.

Do yout think I need to consider other elements ?

I may submit a patch depending on your feedback



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org