You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/04 18:17:44 UTC

[GitHub] [spark] shardulm94 opened a new pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

shardulm94 opened a new pull request #31741:
URL: https://github.com/apache/spark/pull/31741


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   Exclude non-jar dependencies of the ivy/maven packages we want to resolve as our current dependency resolution code assumes artifacts to be jars. https://github.com/apache/spark/blob/17601e014c6ccb48958d35ffb04bedeac8cfc66a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1215 and https://github.com/apache/spark/blob/17601e014c6ccb48958d35ffb04bedeac8cfc66a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L318
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   Some maven artifacts define non-jar dependencies. One such example is `hive-exec`'s dependency on the `pom` of `apache-curator` https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.8/hive-exec-2.3.8.pom
   
   Today trying to depend on such an artifact using `--packages` will print an error but continue without including the non-jar dependency. Doing the same using `spark.sql("ADD JAR ivy://org.apache.hive:hive-exec:2.3.8?exclude=org.pentaho:pentaho-aggdesigner-algorithm")` will cause a failure. Detailed stacktraces can be found in SPARK-34624.
   
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Added unit test. Retried the same example in `spark-shell` which produced the stacktrace in the JIRA.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791180586


   Thank you for confirming, @shardulm94 and @xkrogen .
   I get a fresh PR and test it without it first. It fails to me too. Definitely, it was my mistake while testing.
   ```
   $ build/sbt "core/testOnly *.SparkSubmitUtilsSuite -- -z SPARK-34624"
   ...
   [info] *** 1 TEST FAILED ***
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791543133


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40398/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] shardulm94 commented on a change in pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #31741:
URL: https://github.com/apache/spark/pull/31741#discussion_r587901164



##########
File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala
##########
@@ -277,4 +279,35 @@ class SparkSubmitUtilsSuite extends SparkFunSuite with BeforeAndAfterAll {
         .exists(r.findFirstIn(_).isDefined), "resolution files should be cleaned")
     }
   }
+
+  test("SPARK-34624: should ignore non-jar dependencies") {
+    val main = MavenCoordinate("my.great.lib", "mylib", "0.1")
+    val dep = "my.great.dep:mydep:0.5"

Review comment:
       The version number actually does not matter at all, it can be anything.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791010200


   **[Test build #135774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135774/testReport)** for PR 31741 at commit [`0f253a5`](https://github.com/apache/spark/commit/0f253a5c5507b798da7d41b02ed09ec0d8425afd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791180586


   Thank you for confirming, @shardulm94 and @xkrogen .
   I get a fresh PR and test it without it first. It fails to me too. Definitely, it was my mistake while testing.
   ```
   $ build/sbt "core/testOnly *.SparkSubmitUtilsSuite -- -z SPARK-34624"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791063794


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40356/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #31741:
URL: https://github.com/apache/spark/pull/31741#discussion_r587931199



##########
File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala
##########
@@ -277,4 +279,35 @@ class SparkSubmitUtilsSuite extends SparkFunSuite with BeforeAndAfterAll {
         .exists(r.findFirstIn(_).isDefined), "resolution files should be cleaned")
     }
   }
+
+  test("SPARK-34624: should ignore non-jar dependencies") {
+    val main = MavenCoordinate("my.great.lib", "mylib", "0.1")
+    val dep = "my.great.dep:mydep:0.5"
+
+    IvyTestUtils.withRepository(main, Some(dep), None) { repo =>
+      // IvyTestUtils.withRepository does not have an easy way for creating non-jar dependencies
+      // So we let it create the jar dependency in `mylib-0.1.pom`, and then modify the pom
+      // to change the type of the transitive to `pom`
+      val mainPom = new File(URI.create(repo).resolve("my/great/lib/mylib/0.1/mylib-0.1.pom"))
+      val source = Source.fromFile(mainPom)
+      val modifiedPom = new File(s"$tempIvyPath/modified-mylib-0.1.pom")
+      val sink = new PrintWriter(modifiedPom)
+      source.getLines()
+        .map(l => if (l.trim == "<artifactId>mydep</artifactId>") s"$l<type>pom</type>" else l)
+        // scalastyle:off println
+        .foreach(sink.println)
+        // scalastyle:on println
+      source.close()
+      sink.close()
+      modifiedPom.renameTo(mainPom)

Review comment:
       Can we simplify this a bit with the use of NIO `Files`:
   ```
         val mainPom = Paths.get(URI.create(repo)).resolve("my/great/lib/mylib/0.1/mylib-0.1.pom")
         val lines = java.nio.file.Files.lines(mainPom).iterator.asScala
             .map(l => if (l.trim == "<artifactId>mydep</artifactId>") s"$l<type>pom</type>" else l)
             .toList
         java.nio.file.Files.write(mainPom, lines.asJava)
   ```
   (you can do a rename import on NIO Files to avoid the full path name here, or just get rid of the Guava `Files` import -- looks like there is only one usage in this file)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] shardulm94 commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791471536


   @xkrogen In my experience, I haven't seen an intentional case of non-jar dependencies that are required for runtime/compilation. Most such dependencies are often bundled as part of resource files within jars. So my first instinct was to ignore those. Secondly existing code is built around these being jars, e.g. https://github.com/apache/spark/blob/17601e014c6ccb48958d35ffb04bedeac8cfc66a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L318 uses `addJarToClasspath`, although technically we can pass any file to `addJarToClasspath`.
   
   I am open to implementing this either way. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791549665


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40398/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791519784


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40398/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #31741:
URL: https://github.com/apache/spark/pull/31741#discussion_r587908995



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1203,9 +1203,9 @@ private[spark] object SparkSubmitUtils {
   def resolveDependencyPaths(
       artifacts: Array[AnyRef],
       cacheDirectory: File): Seq[String] = {
-    artifacts.map { artifactInfo =>
-      val artifact = artifactInfo.asInstanceOf[Artifact].getModuleRevisionId
-      val extraAttrs = artifactInfo.asInstanceOf[Artifact].getExtraAttributes
+    artifacts.map(_.asInstanceOf[Artifact]).filter(_.getExt == "jar").map { artifactInfo =>

Review comment:
       Is it worth showing a list of non-jar files that are filtered out here (e.g., `logInfo` or `logWarning`)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] shardulm94 commented on a change in pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #31741:
URL: https://github.com/apache/spark/pull/31741#discussion_r588361353



##########
File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala
##########
@@ -277,4 +279,35 @@ class SparkSubmitUtilsSuite extends SparkFunSuite with BeforeAndAfterAll {
         .exists(r.findFirstIn(_).isDefined), "resolution files should be cleaned")
     }
   }
+
+  test("SPARK-34624: should ignore non-jar dependencies") {
+    val main = MavenCoordinate("my.great.lib", "mylib", "0.1")
+    val dep = "my.great.dep:mydep:0.5"

Review comment:
       Changed to `0.1`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791584239


   > @xkrogen In my experience, I haven't seen an intentional case of non-jar dependencies that are required for runtime/compilation. Most such dependencies are often bundled as part of resource files within jars. So my first instinct was to ignore those. Secondly existing code is built around these being jars, e.g.
   > ...
   > uses `addJarToClasspath`, although technically we can pass any file to `addJarToClasspath`.
   > I am open to implementing this either way.
   
   SGTM, let's go with this approach and if we find future use cases requiring those non-JAR dependencies we can enhance as necessary.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791478453


   **[Test build #135816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135816/testReport)** for PR 31741 at commit [`b90ef6e`](https://github.com/apache/spark/commit/b90ef6e5218421f110346c7b9bb0e75a9037fd29).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31741:
URL: https://github.com/apache/spark/pull/31741#discussion_r587894333



##########
File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala
##########
@@ -277,4 +279,35 @@ class SparkSubmitUtilsSuite extends SparkFunSuite with BeforeAndAfterAll {
         .exists(r.findFirstIn(_).isDefined), "resolution files should be cleaned")
     }
   }
+
+  test("SPARK-34624: should ignore non-jar dependencies") {
+    val main = MavenCoordinate("my.great.lib", "mylib", "0.1")
+    val dep = "my.great.dep:mydep:0.5"

Review comment:
       `0.5` is correct?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791041415


   Thanks for filing this @shardulm94 !
   
   One high-level question: Is the correct behavior to filter/ignore these non-JAR dependencies, or still add them to the classpath, but with the proper extensions? Filtering is definitely better than what we have today, but are there situations where we will still want these non-JAR dependencies on the classpath?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #31741:
URL: https://github.com/apache/spark/pull/31741#discussion_r587926625



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1203,9 +1203,9 @@ private[spark] object SparkSubmitUtils {
   def resolveDependencyPaths(
       artifacts: Array[AnyRef],
       cacheDirectory: File): Seq[String] = {
-    artifacts.map { artifactInfo =>
-      val artifact = artifactInfo.asInstanceOf[Artifact].getModuleRevisionId
-      val extraAttrs = artifactInfo.asInstanceOf[Artifact].getExtraAttributes
+    artifacts.map(_.asInstanceOf[Artifact]).filter(_.getExt == "jar").map { artifactInfo =>

Review comment:
       +1 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791066968


   **[Test build #135774 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135774/testReport)** for PR 31741 at commit [`0f253a5`](https://github.com/apache/spark/commit/0f253a5c5507b798da7d41b02ed09ec0d8425afd).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen edited a comment on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
xkrogen edited a comment on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791041415


   Thanks for filing this @shardulm94 !
   
   One high-level question: Is the correct behavior to filter/ignore these non-JAR dependencies, or still add them to the classpath, but with the proper extensions? Filtering is definitely better than what we have today, but are there situations where we will still want these non-JAR dependencies on the classpath?
   
   Also @dongjoon-hyun  -- I agree with Shardul, the test fails for me without the `SparkSubmit` changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31741:
URL: https://github.com/apache/spark/pull/31741#discussion_r588053299



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1203,9 +1203,9 @@ private[spark] object SparkSubmitUtils {
   def resolveDependencyPaths(
       artifacts: Array[AnyRef],
       cacheDirectory: File): Seq[String] = {
-    artifacts.map { artifactInfo =>
-      val artifact = artifactInfo.asInstanceOf[Artifact].getModuleRevisionId
-      val extraAttrs = artifactInfo.asInstanceOf[Artifact].getExtraAttributes
+    artifacts.map(_.asInstanceOf[Artifact]).filter(_.getExt == "jar").map { artifactInfo =>

Review comment:
       Ya, maybe, `logInfo`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791063794


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40356/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791478453


   **[Test build #135816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135816/testReport)** for PR 31741 at commit [`b90ef6e`](https://github.com/apache/spark/commit/b90ef6e5218421f110346c7b9bb0e75a9037fd29).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791067589


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135774/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791585318


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135816/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-790849729


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791565368


   **[Test build #135816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135816/testReport)** for PR 31741 at commit [`b90ef6e`](https://github.com/apache/spark/commit/b90ef6e5218421f110346c7b9bb0e75a9037fd29).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-790996929


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791067589


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135774/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791549665


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40398/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] shardulm94 commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791021507


   @dongjoon-hyun for me, the test case does fail without the patch, both in IntelliJ and via mvn CLI.
   `build/mvn test -Dtest=none -Dsuites="org.apache.spark.deploy.SparkSubmitUtilsSuite SPARK-34624" -pl core`
   ```
   Run starting. Expected test count is: 1
   SparkSubmitUtilsSuite:
   :: loading settings :: url = jar:file:/Users/shardul/.m2/repository/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
   - SPARK-34624: should ignore non-jar dependencies *** FAILED ***
     jarPath.exists(((x$19: String) => x$19.indexOf("mydep").>=(0))) was true should not find pom dependency (SparkSubmitUtilsSuite.scala:310)
   Run completed in 2 seconds, 831 milliseconds.
   Total number of tests run: 1
   Suites: completed 1, aborted 0
   Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #31741:
URL: https://github.com/apache/spark/pull/31741


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] shardulm94 commented on a change in pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #31741:
URL: https://github.com/apache/spark/pull/31741#discussion_r588362435



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1203,9 +1203,9 @@ private[spark] object SparkSubmitUtils {
   def resolveDependencyPaths(
       artifacts: Array[AnyRef],
       cacheDirectory: File): Seq[String] = {
-    artifacts.map { artifactInfo =>
-      val artifact = artifactInfo.asInstanceOf[Artifact].getModuleRevisionId
-      val extraAttrs = artifactInfo.asInstanceOf[Artifact].getExtraAttributes
+    artifacts.map(_.asInstanceOf[Artifact]).filter(_.getExt == "jar").map { artifactInfo =>

Review comment:
       Added this logging
   
   `21/03/05 06:40:48 INFO SparkSubmitUtils: Skipping non-jar dependency org.apache.curator#apache-curator;2.7.1!apache-curator.pom`

##########
File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala
##########
@@ -277,4 +279,35 @@ class SparkSubmitUtilsSuite extends SparkFunSuite with BeforeAndAfterAll {
         .exists(r.findFirstIn(_).isDefined), "resolution files should be cleaned")
     }
   }
+
+  test("SPARK-34624: should ignore non-jar dependencies") {
+    val main = MavenCoordinate("my.great.lib", "mylib", "0.1")
+    val dep = "my.great.dep:mydep:0.5"
+
+    IvyTestUtils.withRepository(main, Some(dep), None) { repo =>
+      // IvyTestUtils.withRepository does not have an easy way for creating non-jar dependencies
+      // So we let it create the jar dependency in `mylib-0.1.pom`, and then modify the pom
+      // to change the type of the transitive to `pom`
+      val mainPom = new File(URI.create(repo).resolve("my/great/lib/mylib/0.1/mylib-0.1.pom"))
+      val source = Source.fromFile(mainPom)
+      val modifiedPom = new File(s"$tempIvyPath/modified-mylib-0.1.pom")
+      val sink = new PrintWriter(modifiedPom)
+      source.getLines()
+        .map(l => if (l.trim == "<artifactId>mydep</artifactId>") s"$l<type>pom</type>" else l)
+        // scalastyle:off println
+        .foreach(sink.println)
+        // scalastyle:on println
+      source.close()
+      sink.close()
+      modifiedPom.renameTo(mainPom)

Review comment:
       Done! Thanks for the suggestion!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791010200


   **[Test build #135774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135774/testReport)** for PR 31741 at commit [`0f253a5`](https://github.com/apache/spark/commit/0f253a5c5507b798da7d41b02ed09ec0d8425afd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31741:
URL: https://github.com/apache/spark/pull/31741#discussion_r588052487



##########
File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala
##########
@@ -277,4 +279,35 @@ class SparkSubmitUtilsSuite extends SparkFunSuite with BeforeAndAfterAll {
         .exists(r.findFirstIn(_).isDefined), "resolution files should be cleaned")
     }
   }
+
+  test("SPARK-34624: should ignore non-jar dependencies") {
+    val main = MavenCoordinate("my.great.lib", "mylib", "0.1")
+    val dep = "my.great.dep:mydep:0.5"

Review comment:
       If this doesn't matter, please use `0.1`. Otherwise, please document your description as a comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] shardulm94 commented on a change in pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #31741:
URL: https://github.com/apache/spark/pull/31741#discussion_r587901164



##########
File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala
##########
@@ -277,4 +279,35 @@ class SparkSubmitUtilsSuite extends SparkFunSuite with BeforeAndAfterAll {
         .exists(r.findFirstIn(_).isDefined), "resolution files should be cleaned")
     }
   }
+
+  test("SPARK-34624: should ignore non-jar dependencies") {
+    val main = MavenCoordinate("my.great.lib", "mylib", "0.1")
+    val dep = "my.great.dep:mydep:0.5"

Review comment:
       The version number actually does not matter at all, it can be anything. `IvyTestUtils.withRepository` will create the appropriate dependency jars for whatever coordinate you pass in.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-790849729


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31741: [SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31741:
URL: https://github.com/apache/spark/pull/31741#issuecomment-791585318


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135816/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org