You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/16 09:43:21 UTC

[GitHub] [spark] bozhang2820 opened a new pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

bozhang2820 opened a new pull request #31849:
URL: https://github.com/apache/spark/pull/31849


   ### What changes were proposed in this pull request?
   This change is to ignore cache for SNAPSHOT dependencies in spark-submit.
   
   ### Why are the changes needed?
   When spark-submit is executed with --packages, it will not download the dependency jars when they are available in cache (e.g. ivy cache), even when the dependencies are SNAPSHOT.
   
   This might block developers who work on external modules in Spark (e.g. spark-avro), since they need to remove the cache manually every time when they update the code during developments (which generates SNAPSHOT jars). Without knowing this, they could be blocked wondering why their code changes are not reflected in spark-submit executions.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. With this change, developers/users who run spark-submit with SNAPSHOT dependencies do not need to remove the cache every time when the SNAPSHOT dependencies are updated.
   
   ### How was this patch tested?
   Added a unit test.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bozhang2820 commented on pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
bozhang2820 commented on pull request #31849:
URL: https://github.com/apache/spark/pull/31849#issuecomment-803724404


   > Hi, @bozhang2820 .
   > It seems that this code and the added test case is really flaky. Could you take a look, please?
   > 
   >     * https://github.com/apache/spark/runs/2158443719
   > 
   >     * https://github.com/apache/spark/runs/2161516532
   
   Hi @dongjoon-hyun, sorry for the trouble. Let's first revert the change: https://github.com/apache/spark/pull/31918
   
   Also CC @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31849:
URL: https://github.com/apache/spark/pull/31849#issuecomment-803675231


   Hi, @bozhang2820 .
   It seems that this code and the added test case is really flaky. Could you take a look, please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31849:
URL: https://github.com/apache/spark/pull/31849#issuecomment-800121124


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40693/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31849:
URL: https://github.com/apache/spark/pull/31849#discussion_r596549816



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1153,6 +1153,8 @@ private[spark] object SparkSubmitUtils extends Logging {
     // We need a chain resolver if we want to check multiple repositories
     val cr = new ChainResolver
     cr.setName("spark-list")
+    cr.setChangingMatcher(PatternMatcher.REGEXP)
+    cr.setChangingPattern(".*-SNAPSHOT")

Review comment:
       While I agree with the intention (I faced the same problem and had to manually remove cache too), I concern that it will actually make users more confused in a way because it's Maven or Ivy's standard behaviour, and now we're changing how they work by default in Spark, which probably users wouldn't know.
   
   I know it's very unlikely but some users might want to use one cached snapshot (presumably as they know the behaviours of Maven or Ivy resolvers work). Let's say, one CI regularly publishes snapshot, and users want to test one specific version created at the specific time. After this PR, they are forced to use the newest snapshot always.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31849:
URL: https://github.com/apache/spark/pull/31849#discussion_r596584465



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1153,6 +1153,8 @@ private[spark] object SparkSubmitUtils extends Logging {
     // We need a chain resolver if we want to check multiple repositories
     val cr = new ChainResolver
     cr.setName("spark-list")
+    cr.setChangingMatcher(PatternMatcher.REGEXP)
+    cr.setChangingPattern(".*-SNAPSHOT")

Review comment:
       Thanks @bozhang2820 for correcting me. Yeah, I think it makes sense.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #31849:
URL: https://github.com/apache/spark/pull/31849#issuecomment-803675231


   Hi, @bozhang2820 .
   It seems that this code and the added test case is really flaky. Could you take a look, please?
   - https://github.com/apache/spark/runs/2158443719
   - https://github.com/apache/spark/runs/2161516532


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31849:
URL: https://github.com/apache/spark/pull/31849#issuecomment-801668801


   Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31849:
URL: https://github.com/apache/spark/pull/31849#issuecomment-800119756


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136110/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bozhang2820 commented on a change in pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
bozhang2820 commented on a change in pull request #31849:
URL: https://github.com/apache/spark/pull/31849#discussion_r596582901



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1153,6 +1153,8 @@ private[spark] object SparkSubmitUtils extends Logging {
     // We need a chain resolver if we want to check multiple repositories
     val cr = new ChainResolver
     cr.setName("spark-list")
+    cr.setChangingMatcher(PatternMatcher.REGEXP)
+    cr.setChangingPattern(".*-SNAPSHOT")

Review comment:
       The major reason for this change is to reduce frictions for developers, especially new ones, to contribute to Spark external modules. 
   
   I think the behavior proposed here matches the standard ones in Ivy. See "use a naming convention like a special suffix" in https://ant.apache.org/ivy/history/latest-milestone/bestpractices.html, and the implementation in ivy's `IBiblioResolver`: https://github.com/apache/ant-ivy/blob/master/src/java/org/apache/ivy/plugins/resolver/IBiblioResolver.java#L88-L92
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #31849:
URL: https://github.com/apache/spark/pull/31849


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31849: [SPARK-34757][CORE][DEPLOY] Ignore cache for SNAPSHOT dependencies in spark-submit

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31849:
URL: https://github.com/apache/spark/pull/31849#discussion_r596549816



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1153,6 +1153,8 @@ private[spark] object SparkSubmitUtils extends Logging {
     // We need a chain resolver if we want to check multiple repositories
     val cr = new ChainResolver
     cr.setName("spark-list")
+    cr.setChangingMatcher(PatternMatcher.REGEXP)
+    cr.setChangingPattern(".*-SNAPSHOT")

Review comment:
       While I agree with the intention (I faced the same problem and had to manually remove cache too), I concern that it will actually make users more confused in a way because it's Maven or Ivy's standard behaviour, and now we're changing how they work by default (which probably users don't know).
   
   I know it's very unlikely but some users might want to use one cached snapshot (presumably as they know the behaviours of Maven or Ivy resolvers work). Let's say, one CI regularly publishes snapshot, and users want to test one specific version created at the specific time. After this PR, they are forced to use the newest snapshot always.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org