You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "flykobe cheng (JIRA)" <ji...@apache.org> on 2015/05/27 11:46:18 UTC

[jira] [Updated] (SPARK-7892) Python class in __main__ may trigger AssertionError

     [ https://issues.apache.org/jira/browse/SPARK-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

flykobe cheng updated SPARK-7892:
---------------------------------
    Description: 
Callback functions for spark transformations and actions will be pickled. 
If the callback is instancemethod of __main__ module's class, and the class has more than one instancemethod which using class properties or classmethods, the class will be pickled twice, and 'pickle.memoize' twice, then trigger AssertionError.

Demo code and traceback attached.

  was:
Callback functions for spark transformations and actions will be pickled. 
If the callback is instancemethod of __main__ module's class, and the class has more than one instancemethod which using class properties or classmethods, the class will be pickled twice, and 'pickle.memoize' twice, then trigger AssertionError.

Demo code:
class AClass(object):
    _class_var = {'classkey': 'classval', } 

    def main_object_method(self, item):
        logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))

    def main_object_method2(self, item):
        logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))

        
def test_main_object_method(sc):
    obj = AClass()
    res = sc.parallelize(range(4)).map(obj.main_object_method).collect()


if __name__ == '__main__':
    cf = pyspark.SparkConf()
    cf.set('spark.cores.max', 1)

    sc = pyspark.SparkContext(appName = "flykobe_demo_pickle_error", conf = cf)

    test_main_object_method(sc)


Traceback:
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 310, in save_function_tuple
    save(f_globals)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 174, in save_dict
    pickle.Pickler.save_dict(self, obj)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 654, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 686, in _batch_setitems
    save(v)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 468, in save_global
    d),obj=obj)
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 638, in save_reduce
    self.memoize(obj)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 248, in memoize
    assert id(obj) not in self.memo 
AssertionError


Problem in Python/Lib/pickle.py:
    def memoize(self, obj):
        """Store an object in the memo."""
        if self.fast:
            return
        assert id(obj) not in self.memo
        memo_len = len(self.memo)
        self.write(self.put(memo_len))
        self.memo[id(obj)] = memo_len, obj


> Python class in __main__ may trigger AssertionError
> ---------------------------------------------------
>
>                 Key: SPARK-7892
>                 URL: https://issues.apache.org/jira/browse/SPARK-7892
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.2.0
>         Environment: Linux, Python 2.7.3
> pickled by Python pickle Lib
>            Reporter: flykobe cheng
>            Priority: Minor
>
> Callback functions for spark transformations and actions will be pickled. 
> If the callback is instancemethod of __main__ module's class, and the class has more than one instancemethod which using class properties or classmethods, the class will be pickled twice, and 'pickle.memoize' twice, then trigger AssertionError.
> Demo code and traceback attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org