You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2023/03/20 01:15:00 UTC

[jira] [Updated] (SPARK-42837) spark-submit - issue when resolving dependencies hosted on a private repository in kubernetes cluster mode

     [ https://issues.apache.org/jira/browse/SPARK-42837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-42837:
---------------------------------
    Component/s: Kubernetes

> spark-submit - issue when resolving dependencies hosted on a private repository in kubernetes cluster mode
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-42837
>                 URL: https://issues.apache.org/jira/browse/SPARK-42837
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Spark Submit
>    Affects Versions: 3.3.2
>            Reporter: lione Herbet
>            Priority: Minor
>
> When using [spark operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator], if dependencies are hosted on a private repository with authentication needed (like S3 or OCI) the spark operator submitting the job need to have all the secrets to access all dependencies. If not the spark-submit fails.
> On a multi tenant kubernetes cluster where the spark operator and spark jobs execution are on seperate namespaces, it involves duplicating all secrets or it won't work.
> It seems that spark-submit need to acces dependencies (with credentials) only to resolveGlobPath ([https://github.com/apache/spark/blob/v3.3.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L364-L367)] . It seems to me (but need to be confirmed by someone more skilled than me on spark internals behavior) that this resolveGlobPath task is also done when the driver is downloading the jars.
> Would it be possible to have this resolveGlobPath task skipped when running on a  Kubernetes Cluster in cluster mode ?
> For example add a condition like this arround the 364-367 lines :
> {code:java}
> if (isKubernetesCluster) {
> ...
> } {code}
> We could even, for compatibility reason with old behavior if needed, add also a condition on a spark parameter like this :
> {code:java}
> if (isKubernetesCluster && sparkConf.getBoolean("spark.kubernetes.resolevGlobPathsInSubmit", true)) { 
> ...
> }{code}
> i tested both solution locally and it seems to resolve the case.
> Do yout think I need to consider other elements ?
> I may submit a patch depending on your feedback



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org