You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@livy.apache.org by "shanyu zhao (Jira)" <ji...@apache.org> on 2020/02/16 21:25:00 UTC

[jira] [Created] (LIVY-750) Livy uploads local pyspark archives to Yarn distributed cache

shanyu zhao created LIVY-750:
--------------------------------

             Summary: Livy uploads local pyspark archives to Yarn distributed cache
                 Key: LIVY-750
                 URL: https://issues.apache.org/jira/browse/LIVY-750
             Project: Livy
          Issue Type: Bug
          Components: Server
    Affects Versions: 0.7.0, 0.6.0
            Reporter: shanyu zhao
         Attachments: image-2020-02-16-13-19-40-645.png, image-2020-02-16-13-19-59-591.png

On Livy Server, even if we set  pyspark archives to use local files:
{code:bash}
export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}

Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip

Note that this is after we fixed Spark code in SPARK-30845 to not always upload local archives.

The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", which will be added to Yarn distributed cache by Spark. Since spark-submit already takes care of uploading pyspark archives, there is no need for Livy to redundantly do so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)