You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/12/03 00:22:00 UTC

[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

     [ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=352328&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-352328 ]

ASF GitHub Bot logged work on BEAM-8651:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Dec/19 00:21
            Start Date: 03/Dec/19 00:21
    Worklog Time Spent: 10m 
      Work Description: kennknowles commented on pull request #10167: [BEAM-8651] Guard pickling operations with a lock to prevent race condition in module imports.
URL: https://github.com/apache/beam/pull/10167#discussion_r352929133
 
 

 ##########
 File path: sdks/python/apache_beam/internal/pickler.py
 ##########
 @@ -33,12 +33,16 @@
 import base64
 import logging
 import sys
+import threading
 import traceback
 import types
 import zlib
 
 import dill
 
+# Pickling, especially unpickling, can cause broken module imports
+# if executed concurrently, see: BEAM-8651.
+pickle_lock = threading.Lock()
 
 Review comment:
   Should this be RLock?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 352328)
    Time Spent: 3h 40m  (was: 3.5h)

> Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
> -------------------------------------------------------------------------------------
>
>                 Key: BEAM-8651
>                 URL: https://issues.apache.org/jira/browse/BEAM-8651
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Valentyn Tymofieiev
>            Priority: Blocker
>             Fix For: 2.17.0
>
>         Attachments: beam8651.py
>
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Several Beam users reported an intermittent error which happens during unpickling in StockUnpickler.find_class. A similar error happens consistently when user's pipelines have instances of super() in their main module, and use --save_main_session, see: [BEAM-6158|https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945]. 
> In this case the error happens only sometimes, and super() calls don't play a role.  
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink and Dataflow runners. On Dataflow runner so far I have seen this in streaming pipelines only, which use portable SDK worker.    
> Typical stack trace:                                                    
> {noformat}
> File "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)                                       
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, in loads
>     return dill.loads(s)                                                           
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads                 
>     return load(file, ignore)                                                      
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load                  
>     obj = pik.load()                                                               
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class            
>     return StockUnpickler.find_class(self, module, name)                           
> AttributeError: Can't get attribute 'ClassName' on <module 'ModuleName' from 'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)