You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:14:17 UTC

[jira] [Resolved] (SPARK-23460) PySpark concurrency python egg cache directory

     [ https://issues.apache.org/jira/browse/SPARK-23460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-23460.
----------------------------------
    Resolution: Incomplete

> PySpark concurrency python egg cache directory
> ----------------------------------------------
>
>                 Key: SPARK-23460
>                 URL: https://issues.apache.org/jira/browse/SPARK-23460
>             Project: Spark
>          Issue Type: Question
>          Components: PySpark
>    Affects Versions: 2.1.2
>         Environment: YARN last
>            Reporter: Dmitiry
>            Priority: Trivial
>              Labels: bulk-closed, pyFiles
>
> We are experiencing intermittent failures when running task on pyspark while installing dependencies through --py-files with python egg. We set (else permission denied on egg cache):
> {noformat}
> --conf "spark.executorEnv.PYTHON_EGG_CACHE=./.python-eggs"{noformat}
>  
> Error:
> {noformat}
> INFO - File "build/bdist.linux-x86_64/egg/ua_parser/user_agent_parser.py", line 409, in <module>
> INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 904, in resource_filename
> INFO - self, resource_name
> INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1380, in get_resource_filename
> INFO - return self._extract_resource(manager, zip_path)
> INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1405, in _extract_resource
> INFO - self.egg_name, self._parts(zip_path)
> INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 984, in get_cache_path
> INFO - self.extraction_error()
> INFO - File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 950, in extraction_error
> INFO - raise err
> INFO - ExtractionError: Can't extract file(s) to egg cache
> INFO - 
> INFO - The following error occurred while trying to extract file(s) to the Python egg
> INFO - cache:
> INFO - 
> INFO - [Errno 17] File exists: './.python-eggs'
> INFO - 
> INFO - The Python egg cache directory is currently set to:
> INFO - 
> INFO - ./.python-eggs/
> INFO - 
> INFO - Perhaps your account does not have write access to this directory? You can
> INFO - change the cache directory by setting the PYTHON_EGG_CACHE environment
> INFO - variable to point to an accessible directory.{noformat}
>  
> We create a package with an option `safe_zip=False`. But pyspark whatever use egg cache directory.
> Is there any way around this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org