You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Keunhyun Oh (Jira)" <ji...@apache.org> on 2021/04/21 02:08:00 UTC

[jira] [Comment Edited] (SPARK-35084) [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not added to sparkContext

    [ https://issues.apache.org/jira/browse/SPARK-35084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326200#comment-17326200 ] 

Keunhyun Oh edited comment on SPARK-35084 at 4/21/21, 2:07 AM:
---------------------------------------------------------------

*Spark 2.4.5*

[https://github.com/apache/spark/blob/v2.4.5/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
     if (!isMesosCluster && !isStandAloneCluster) {
      // Resolve maven dependencies if there are any and add classpath to jars. Add them to py-files
      // too for packages that include Python code
      val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies(
        args.packagesExclusions, args.packages, args.repositories, args.ivyRepoPath,
        args.ivySettingsPath)
      
      if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
        args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
        if (args.isPython || isInternal(args.primaryResource)) {
          args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
        }
      } 
      
      // install any R packages that may have been passed through --jars or --packages.
      // Spark Packages may contain R source code inside the jar.
      if (args.isR && !StringUtils.isBlank(args.jars)) {
        RPackageUtils.checkAndBuildRPackage(args.jars, printStream, args.verbose)
      }
    } {code}
 

*Spark 3.0.2*

[https://github.com/apache/spark/blob/v3.0.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
       if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
        // In K8s client mode, when in the driver, add resolved jars early as we might need
        // them at the submit time for artifact downloading.
        // For example we might use the dependencies for downloading
        // files from a Hadoop Compatible fs eg. S3. In this case the user might pass:
        // --packages com.amazonaws:aws-java-sdk:1.7.4:org.apache.hadoop:hadoop-aws:2.7.6
        if (isKubernetesClusterModeDriver) {
          val loader = getSubmitClassLoader(sparkConf)
          for (jar <- resolvedMavenCoordinates.split(",")) {
            addJarToClasspath(jar, loader)
          }
        } else if (isKubernetesCluster) {
          // We need this in K8s cluster mode so that we can upload local deps
          // via the k8s application, like in cluster mode driver
          childClasspath ++= resolvedMavenCoordinates.split(",")
        } else {
          args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
          if (args.isPython || isInternal(args.primaryResource)) {
            args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
          }
        }
      }{code}
 

When using k8s master, in spark 2, jars derived from maven are added to args.jars.

However, in spark 3, maven dependencies are not merged to args.jars.

 

I assume that because of it k8s cluster mode spark-submit is not supported spark.jars.packages I expected.

So, jars from packages are not added to spark context.

 

How to use maven packages in k8s cluster mode?


was (Author: ocworld):
*Spark 2.4.5*

[https://github.com/apache/spark/blob/v2.4.5/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
     if (!isMesosCluster && !isStandAloneCluster) {
      // Resolve maven dependencies if there are any and add classpath to jars. Add them to py-files
      // too for packages that include Python code
      val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies(
        args.packagesExclusions, args.packages, args.repositories, args.ivyRepoPath,
        args.ivySettingsPath)
      
      if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
        args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
        if (args.isPython || isInternal(args.primaryResource)) {
          args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
        }
      } 
      
      // install any R packages that may have been passed through --jars or --packages.
      // Spark Packages may contain R source code inside the jar.
      if (args.isR && !StringUtils.isBlank(args.jars)) {
        RPackageUtils.checkAndBuildRPackage(args.jars, printStream, args.verbose)
      }
    } {code}
 

*Spark 3.0.2*

**[https://github.com/apache/spark/blob/v3.0.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
       if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
        // In K8s client mode, when in the driver, add resolved jars early as we might need
        // them at the submit time for artifact downloading.
        // For example we might use the dependencies for downloading
        // files from a Hadoop Compatible fs eg. S3. In this case the user might pass:
        // --packages com.amazonaws:aws-java-sdk:1.7.4:org.apache.hadoop:hadoop-aws:2.7.6
        if (isKubernetesClusterModeDriver) {
          val loader = getSubmitClassLoader(sparkConf)
          for (jar <- resolvedMavenCoordinates.split(",")) {
            addJarToClasspath(jar, loader)
          }
        } else if (isKubernetesCluster) {
          // We need this in K8s cluster mode so that we can upload local deps
          // via the k8s application, like in cluster mode driver
          childClasspath ++= resolvedMavenCoordinates.split(",")
        } else {
          args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
          if (args.isPython || isInternal(args.primaryResource)) {
            args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
          }
        }
      }{code}
 

When using k8s master, in spark 2, jars derived from maven are added to args.jars.

However, in spark 3, maven dependencies are not merged to args.jars.

 

I assume that because of it k8s cluster mode spark-submit is not supported spark.jars.packages I expected.

So, jars from packages are not added to spark context.

 

> [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not added to sparkContext
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-35084
>                 URL: https://issues.apache.org/jira/browse/SPARK-35084
>             Project: Spark
>          Issue Type: Question
>          Components: Kubernetes
>    Affects Versions: 3.0.0, 3.0.2, 3.1.1
>            Reporter: Keunhyun Oh
>            Priority: Major
>
> I'm trying to migrate spark 2 to spark 3 in k8s.
>  
> In my environment, on Spark 3.x, jars listed in spark.jars and spark.jars.packages are not added to sparkContext.
> After driver's process is launched, jars are not propagated to Executors. So, NoClassDefException is raised in executors.
>  
> In spark.properties, the only main application jar is contained in spark.jars. It is different from Spark 2.
>  
> How to solve this situation? Is it any changed spark options in spark 3 from spark 2?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org