You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Chandler May (JIRA)" <ji...@apache.org> on 2017/01/26 21:59:24 UTC

[jira] [Updated] (THRIFT-4042) ExtractionError when using accelerated thrift in a multiprocess test

     [ https://issues.apache.org/jira/browse/THRIFT-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chandler May updated THRIFT-4042:
---------------------------------
    Description: 
We recently switched to thrift 0.10.0 with accelerated protocols and started getting sporadic errors in tests that use the multiprocessing module of the form:

{code}
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python2.7/multiprocessing/pool.py:250: in map
    return self.map_async(func, iterable, chunksize).get()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <multiprocessing.pool.MapResult object at 0x2e06950>, timeout = None

    def get(self, timeout=None):
        self.wait(timeout)
        if not self._ready:
            raise TimeoutError
        if self._success:
            return self._value
        else:
>           raise self._value
E           ExtractionError: Can't extract file(s) to egg cache
E           
E           The following error occurred while trying to extract file(s) to the Python egg
E           cache:
E           
E             [Errno 17] File exists: '/home/concrete/.cache/Python-Eggs'
E           
E           The Python egg cache directory is currently set to:
E           
E             /home/concrete/.cache/Python-Eggs
E           
E           Perhaps your account does not have write access to this directory?  You can
E           change the cache directory by setting the PYTHON_EGG_CACHE environment
E           variable to point to an accessible directory.

/usr/lib64/python2.7/multiprocessing/pool.py:554: ExtractionError
{code}

This particular error arose from a test we wrote to isolate the issue.  It is of the form:

{code}
    from multiprocessing import Pool

    input_path = '/path/to/thrift_serialized_data'                                 
    num_trials = 100                                                            
    num_procs = 2                                                               
    num_tasks = 4                                                               
                                                                                
    for i in xrange(num_trials):                                                
        pool = Pool(num_procs)                                                  
        results = pool.map(_deserialize, [input_path] * num_tasks)      
        for result in results:                                          
            assert result is True
{code}

where {{_deserialize}} is a function that reads thrift serialized objects from a file and returns {{True}} on success.  I can provide MWE if necessary but it would take some time on my part.

I want to stress that this only happens when using the new accelerated protocol in thrift 0.10.0 and only happens in {{python setup.py test}} in our project when thrift has not been installed via *pip* (but has been installed by {{python setup.py install}} in our project, which depends on thrift).  We are using pytest but I'm not sure whether that's important.  At test time thrift gets installed/unpacked as an egg in a local directory and gets a locking error.  I believe this is the same error as:

http://dev.list.galaxyproject.org/python-egg-cache-exists-error-td4656276.html

http://www.georgevreilly.com/blog/2015/01/28/PythonEggCache.html

I believe the documentation indicates this problem can be worked around by setting {{zip_safe}} to {{False}} in {{setup.py}}:

http://setuptools.readthedocs.io/en/latest/setuptools.html

  was:
We recently switched to thrift 0.10.0 with accelerated protocols and started getting sporadic errors in tests that use the multiprocessing module of the form:

{code}
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib64/python2.7/multiprocessing/pool.py:250: in map
    return self.map_async(func, iterable, chunksize).get()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <multiprocessing.pool.MapResult object at 0x2e06950>, timeout = None

    def get(self, timeout=None):
        self.wait(timeout)
        if not self._ready:
            raise TimeoutError
        if self._success:
            return self._value
        else:
>           raise self._value
E           ExtractionError: Can't extract file(s) to egg cache
E           
E           The following error occurred while trying to extract file(s) to the Python egg
E           cache:
E           
E             [Errno 17] File exists: '/home/concrete/.cache/Python-Eggs'
E           
E           The Python egg cache directory is currently set to:
E           
E             /home/concrete/.cache/Python-Eggs
E           
E           Perhaps your account does not have write access to this directory?  You can
E           change the cache directory by setting the PYTHON_EGG_CACHE environment
E           variable to point to an accessible directory.

/usr/lib64/python2.7/multiprocessing/pool.py:554: ExtractionError
{code}

This particular error arose from a test we wrote to isolate the issue.  It is of the form:

{code}
    from multiprocessing import Pool

    input_path = '/path/to/thrift_serialized_data'                                 
    num_trials = 100                                                            
    num_procs = 2                                                               
    num_tasks = 4                                                               
                                                                                
    for i in xrange(num_trials):                                                
        pool = Pool(num_procs)                                                  
        results = pool.map(_deserialize, [input_path] * num_tasks)      
        for result in results:                                          
            assert result is True
{code}

where {{_deserialize}} is a function that reads thrift serialized objects from a file and returns {{True}} on success.  I can provide MWE if necessary but it would take some time on my part.

I want to stress that this only happens when using the new accelerated protocol in thrift 0.10.0 and only happens in {{python setup.py test}} in our project when thrift is *not* installed already on the system.  We are using pytest but I'm not sure whether that's important.  At test time thrift gets installed/unpacked as an egg in a local directory and gets a locking error.  I believe this is the same error as:

http://dev.list.galaxyproject.org/python-egg-cache-exists-error-td4656276.html

http://www.georgevreilly.com/blog/2015/01/28/PythonEggCache.html

I believe the documentation indicates this problem can be worked around by setting {{zip_safe}} to {{False}} in {{setup.py}}:

http://setuptools.readthedocs.io/en/latest/setuptools.html


> ExtractionError when using accelerated thrift in a multiprocess test
> --------------------------------------------------------------------
>
>                 Key: THRIFT-4042
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4042
>             Project: Thrift
>          Issue Type: Bug
>          Components: Python - Library
>    Affects Versions: 0.10.0
>            Reporter: Chandler May
>
> We recently switched to thrift 0.10.0 with accelerated protocols and started getting sporadic errors in tests that use the multiprocessing module of the form:
> {code}
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> /usr/lib64/python2.7/multiprocessing/pool.py:250: in map
>     return self.map_async(func, iterable, chunksize).get()
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> self = <multiprocessing.pool.MapResult object at 0x2e06950>, timeout = None
>     def get(self, timeout=None):
>         self.wait(timeout)
>         if not self._ready:
>             raise TimeoutError
>         if self._success:
>             return self._value
>         else:
> >           raise self._value
> E           ExtractionError: Can't extract file(s) to egg cache
> E           
> E           The following error occurred while trying to extract file(s) to the Python egg
> E           cache:
> E           
> E             [Errno 17] File exists: '/home/concrete/.cache/Python-Eggs'
> E           
> E           The Python egg cache directory is currently set to:
> E           
> E             /home/concrete/.cache/Python-Eggs
> E           
> E           Perhaps your account does not have write access to this directory?  You can
> E           change the cache directory by setting the PYTHON_EGG_CACHE environment
> E           variable to point to an accessible directory.
> /usr/lib64/python2.7/multiprocessing/pool.py:554: ExtractionError
> {code}
> This particular error arose from a test we wrote to isolate the issue.  It is of the form:
> {code}
>     from multiprocessing import Pool
>     input_path = '/path/to/thrift_serialized_data'                                 
>     num_trials = 100                                                            
>     num_procs = 2                                                               
>     num_tasks = 4                                                               
>                                                                                 
>     for i in xrange(num_trials):                                                
>         pool = Pool(num_procs)                                                  
>         results = pool.map(_deserialize, [input_path] * num_tasks)      
>         for result in results:                                          
>             assert result is True
> {code}
> where {{_deserialize}} is a function that reads thrift serialized objects from a file and returns {{True}} on success.  I can provide MWE if necessary but it would take some time on my part.
> I want to stress that this only happens when using the new accelerated protocol in thrift 0.10.0 and only happens in {{python setup.py test}} in our project when thrift has not been installed via *pip* (but has been installed by {{python setup.py install}} in our project, which depends on thrift).  We are using pytest but I'm not sure whether that's important.  At test time thrift gets installed/unpacked as an egg in a local directory and gets a locking error.  I believe this is the same error as:
> http://dev.list.galaxyproject.org/python-egg-cache-exists-error-td4656276.html
> http://www.georgevreilly.com/blog/2015/01/28/PythonEggCache.html
> I believe the documentation indicates this problem can be worked around by setting {{zip_safe}} to {{False}} in {{setup.py}}:
> http://setuptools.readthedocs.io/en/latest/setuptools.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)