You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/14 11:37:15 UTC

[GitHub] [spark] cxzl25 opened a new pull request #30036: [SPARK-33147][CORE] Avoid distribute user jar from driver in yarn client mode

cxzl25 opened a new pull request #30036:
URL: https://github.com/apache/spark/pull/30036


   ### What changes were proposed in this pull request?
   Avoid distribute user jar from driver in yarn client mode.
   Add the user's jar to `--jars`, this can be automatically uploaded to `spark.yarn.stagingDir`, usually hdfs.
   Download user jar from stagingDir when executor is initialized.
   
   ### Why are the changes needed?
   When the number of applied executors is large and the jar size is large, the executor pulls the jar from the driver, and the driver network traffic is high, and a timeout may occur. The driver and the executor of the yarn cluster may not be in the same data center.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   exist UT
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on pull request #30036: [SPARK-33147][CORE] Avoid distribute user jar from driver in yarn client mode

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on pull request #30036:
URL: https://github.com/apache/spark/pull/30036#issuecomment-730269272


   Can you help review this pr? @jerryshao
   
   https://github.com/cxzl25/spark/actions/runs/367373884
   
   ![image](https://user-images.githubusercontent.com/3898450/99652308-a06caa80-2a92-11eb-8fb2-d6afe7af59fd.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on a change in pull request #30036: [SPARK-33147][CORE] Avoid distribute user jar from driver in yarn client mode

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on a change in pull request #30036:
URL: https://github.com/apache/spark/pull/30036#discussion_r504610362



##########
File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
##########
@@ -329,8 +329,8 @@ class SparkSubmitSuite
     conf.get("spark.executor.instances") should be ("6")
     conf.get("spark.yarn.dist.files") should include regex (".*file1.txt,.*file2.txt")
     conf.get("spark.yarn.dist.archives") should include regex (".*archive1.txt,.*archive2.txt")
-    conf.get("spark.yarn.dist.jars") should include
-      regex (".*one.jar,.*two.jar,.*three.jar,.*thejar.jar")

Review comment:
       This line of test is not executed. It will fail if executed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30036: [SPARK-33147][CORE] Avoid distribute user jar from driver in yarn client mode

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30036:
URL: https://github.com/apache/spark/pull/30036#issuecomment-708601027


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #30036: [SPARK-33147][CORE] Avoid distribute user jar from driver in yarn client mode

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #30036:
URL: https://github.com/apache/spark/pull/30036#issuecomment-855488960


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #30036: [SPARK-33147][CORE] Avoid distribute user jar from driver in yarn client mode

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #30036:
URL: https://github.com/apache/spark/pull/30036


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] YMajid commented on a change in pull request #30036: [SPARK-33147][CORE] Avoid distribute user jar from driver in yarn client mode

Posted by GitBox <gi...@apache.org>.
YMajid commented on a change in pull request #30036:
URL: https://github.com/apache/spark/pull/30036#discussion_r572899350



##########
File path: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala
##########
@@ -169,12 +169,13 @@ private[deploy] object DependencyUtils extends Logging {
   }
 
   /**
-   * Merge a sequence of comma-separated file lists, some of which may be null to indicate
-   * no files, into a single comma-separated string.
+   * Merge and de-duplicate a sequence of comma-separated file lists,
+   * some of which may be null to indicate no files,
+   * into a single comma-separated string.
    */
   def mergeFileLists(lists: String*): String = {
     val merged = lists.filterNot(StringUtils.isBlank)
-      .flatMap(Utils.stringToSeq)
+      .flatMap(Utils.stringToSeq).distinct

Review comment:
       Sorry, I have a few questions for you that might be trivial but I'm unclear about them. I was wondering why do we need the `.distinct`? Obviously, I know that it removes the duplicated values, but how does it change how the function is used/what implications does it have?
   
   Thank you!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30036: [SPARK-33147][CORE] Avoid distribute user jar from driver in yarn client mode

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30036:
URL: https://github.com/apache/spark/pull/30036#issuecomment-708602285


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on a change in pull request #30036: [SPARK-33147][CORE] Avoid distribute user jar from driver in yarn client mode

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on a change in pull request #30036:
URL: https://github.com/apache/spark/pull/30036#discussion_r583521775



##########
File path: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala
##########
@@ -169,12 +169,13 @@ private[deploy] object DependencyUtils extends Logging {
   }
 
   /**
-   * Merge a sequence of comma-separated file lists, some of which may be null to indicate
-   * no files, into a single comma-separated string.
+   * Merge and de-duplicate a sequence of comma-separated file lists,
+   * some of which may be null to indicate no files,
+   * into a single comma-separated string.
    */
   def mergeFileLists(lists: String*): String = {
     val merged = lists.filterNot(StringUtils.isBlank)
-      .flatMap(Utils.stringToSeq)
+      .flatMap(Utils.stringToSeq).distinct

Review comment:
       
   Avoid uploading the same file
   
   https://github.com/apache/spark/blob/5c7d019b609c87a9427fa9309f3aa03d02f61878/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L455-L460
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30036: [SPARK-33147][CORE] Avoid distribute user jar from driver in yarn client mode

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30036:
URL: https://github.com/apache/spark/pull/30036#issuecomment-708601027


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org