You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@livy.apache.org by "shanyu zhao (Jira)" <ji...@apache.org> on 2020/02/16 21:25:00 UTC
[jira] [Created] (LIVY-750) Livy uploads local pyspark archives to
Yarn distributed cache
shanyu zhao created LIVY-750:
--------------------------------
Summary: Livy uploads local pyspark archives to Yarn distributed cache
Key: LIVY-750
URL: https://issues.apache.org/jira/browse/LIVY-750
Project: Livy
Issue Type: Bug
Components: Server
Affects Versions: 0.7.0, 0.6.0
Reporter: shanyu zhao
Attachments: image-2020-02-16-13-19-40-645.png, image-2020-02-16-13-19-59-591.png
On Livy Server, even if we set pyspark archives to use local files:
{code:bash}
export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}
Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip
Note that this is after we fixed Spark code in SPARK-30845 to not always upload local archives.
The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", which will be added to Yarn distributed cache by Spark. Since spark-submit already takes care of uploading pyspark archives, there is no need for Livy to redundantly do so.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)