You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/08 18:20:21 UTC

[GitHub] [spark] warrenzhu25 commented on a change in pull request #30282: [SPARK-33375][CORE] Add config spark.yarn.pyspark.archives

warrenzhu25 commented on a change in pull request #30282:
URL: https://github.com/apache/spark/pull/30282#discussion_r519459534



##########
File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##########
@@ -130,6 +130,13 @@ package object config extends Logging {
     .stringConf
     .createOptional
 
+  private[spark] val SPARK_PYSPARK_ARCHIVE = ConfigBuilder("spark.yarn.pyspark.archives")
+    .doc("Location of pyspark.zip and py4j.zip.")

Review comment:
       I tried to use --py-files, but it has path and resources like below:
   ```
   PYTHONPATH -> {{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.9-src.zip
   
   resources:
       __spark_conf__ -> resource { scheme: "hdfs" host: "MTPrime-CO4-fed" port: -1 file: "/user/zhonzh/.sparkStaging/application_1604622164128_7216/__spark_conf__.zip" } size: 536359 timestamp: 1604858318432 type: ARCHIVE visibility: PRIVATE
       pyspark.zip -> resource { scheme: "hdfs" host: "MTPrime-CO4-fed" port: -1 file: "/user/zhonzh/.sparkStaging/application_1604622164128_7216/pyspark.zip" } size: 595809 timestamp: 1604858311600 type: FILE visibility: PUBLIC
       py4j-0.10.9-src.zip -> resource { scheme: "hdfs" host: "MTPrime-CO4-fed" port: -1 file: "/user/zhonzh/.sparkStaging/application_1604622164128_7216/py4j-0.10.9-src.zip" } size: 41587 timestamp: 1604858316398 type: FILE visibility: PUBLIC
       __spark_libs__ -> resource { scheme: "hdfs" host: "MTPrime-CO4-fed" port: -1 file: "/user/zhonzh/.sparkStaging/application_1604622164128_7216/spark-3.0.1-mt-jars.zip" } size: 197891674 timestamp: 1604858291631 type: ARCHIVE visibility: PUBLIC
       py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host: "MTPrime-CO4-fed" port: -1 file: "/user/zhonzh/.sparkStaging/application_1604622164128_7216/py4j-0.10.7-src.zip" } size: 42437 timestamp: 1604858314170 type: FILE visibility: PUBLIC
   ```
   
   I used `--py-files "hdfs://MTPrime-CO4-0/user/zhonzh/pyspark.zip,hdfs://MTPrime-CO4-0/user/zhonzh/py4j-0.10.9-src.zip"`. This is from spark3 while local python lib is spark 2.4.
   
   We have 2 issues here:
   
   1. Local pyspark.zip is added first, it take precedence. This cause passed by pyFiles not working.
   2. If I use same name as pyspark.zip, the upload will be skipped as both have same name.
   
   What's your suggestions to handle this?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org