You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Keunhyun Oh (Jira)" <ji...@apache.org> on 2021/04/21 02:08:00 UTC
[jira] [Comment Edited] (SPARK-35084) [k8s] On Spark 3, jars listed
in spark.jars and spark.jars.packages are not added to sparkContext
[ https://issues.apache.org/jira/browse/SPARK-35084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326200#comment-17326200 ]
Keunhyun Oh edited comment on SPARK-35084 at 4/21/21, 2:07 AM:
---------------------------------------------------------------
*Spark 2.4.5*
[https://github.com/apache/spark/blob/v2.4.5/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
if (!isMesosCluster && !isStandAloneCluster) {
// Resolve maven dependencies if there are any and add classpath to jars. Add them to py-files
// too for packages that include Python code
val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies(
args.packagesExclusions, args.packages, args.repositories, args.ivyRepoPath,
args.ivySettingsPath)
if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
if (args.isPython || isInternal(args.primaryResource)) {
args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
}
}
// install any R packages that may have been passed through --jars or --packages.
// Spark Packages may contain R source code inside the jar.
if (args.isR && !StringUtils.isBlank(args.jars)) {
RPackageUtils.checkAndBuildRPackage(args.jars, printStream, args.verbose)
}
} {code}
*Spark 3.0.2*
[https://github.com/apache/spark/blob/v3.0.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
// In K8s client mode, when in the driver, add resolved jars early as we might need
// them at the submit time for artifact downloading.
// For example we might use the dependencies for downloading
// files from a Hadoop Compatible fs eg. S3. In this case the user might pass:
// --packages com.amazonaws:aws-java-sdk:1.7.4:org.apache.hadoop:hadoop-aws:2.7.6
if (isKubernetesClusterModeDriver) {
val loader = getSubmitClassLoader(sparkConf)
for (jar <- resolvedMavenCoordinates.split(",")) {
addJarToClasspath(jar, loader)
}
} else if (isKubernetesCluster) {
// We need this in K8s cluster mode so that we can upload local deps
// via the k8s application, like in cluster mode driver
childClasspath ++= resolvedMavenCoordinates.split(",")
} else {
args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
if (args.isPython || isInternal(args.primaryResource)) {
args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
}
}
}{code}
When using k8s master, in spark 2, jars derived from maven are added to args.jars.
However, in spark 3, maven dependencies are not merged to args.jars.
I assume that because of it k8s cluster mode spark-submit is not supported spark.jars.packages I expected.
So, jars from packages are not added to spark context.
How to use maven packages in k8s cluster mode?
was (Author: ocworld):
*Spark 2.4.5*
[https://github.com/apache/spark/blob/v2.4.5/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
if (!isMesosCluster && !isStandAloneCluster) {
// Resolve maven dependencies if there are any and add classpath to jars. Add them to py-files
// too for packages that include Python code
val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies(
args.packagesExclusions, args.packages, args.repositories, args.ivyRepoPath,
args.ivySettingsPath)
if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
if (args.isPython || isInternal(args.primaryResource)) {
args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
}
}
// install any R packages that may have been passed through --jars or --packages.
// Spark Packages may contain R source code inside the jar.
if (args.isR && !StringUtils.isBlank(args.jars)) {
RPackageUtils.checkAndBuildRPackage(args.jars, printStream, args.verbose)
}
} {code}
*Spark 3.0.2*
**[https://github.com/apache/spark/blob/v3.0.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
// In K8s client mode, when in the driver, add resolved jars early as we might need
// them at the submit time for artifact downloading.
// For example we might use the dependencies for downloading
// files from a Hadoop Compatible fs eg. S3. In this case the user might pass:
// --packages com.amazonaws:aws-java-sdk:1.7.4:org.apache.hadoop:hadoop-aws:2.7.6
if (isKubernetesClusterModeDriver) {
val loader = getSubmitClassLoader(sparkConf)
for (jar <- resolvedMavenCoordinates.split(",")) {
addJarToClasspath(jar, loader)
}
} else if (isKubernetesCluster) {
// We need this in K8s cluster mode so that we can upload local deps
// via the k8s application, like in cluster mode driver
childClasspath ++= resolvedMavenCoordinates.split(",")
} else {
args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
if (args.isPython || isInternal(args.primaryResource)) {
args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
}
}
}{code}
When using k8s master, in spark 2, jars derived from maven are added to args.jars.
However, in spark 3, maven dependencies are not merged to args.jars.
I assume that because of it k8s cluster mode spark-submit is not supported spark.jars.packages I expected.
So, jars from packages are not added to spark context.
> [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not added to sparkContext
> -------------------------------------------------------------------------------------------------
>
> Key: SPARK-35084
> URL: https://issues.apache.org/jira/browse/SPARK-35084
> Project: Spark
> Issue Type: Question
> Components: Kubernetes
> Affects Versions: 3.0.0, 3.0.2, 3.1.1
> Reporter: Keunhyun Oh
> Priority: Major
>
> I'm trying to migrate spark 2 to spark 3 in k8s.
>
> In my environment, on Spark 3.x, jars listed in spark.jars and spark.jars.packages are not added to sparkContext.
> After driver's process is launched, jars are not propagated to Executors. So, NoClassDefException is raised in executors.
>
> In spark.properties, the only main application jar is contained in spark.jars. It is different from Spark 2.
>
> How to solve this situation? Is it any changed spark options in spark 3 from spark 2?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org