You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "flykobe cheng (JIRA)" <ji...@apache.org> on 2015/05/27 11:34:17 UTC

[jira] [Created] (SPARK-7891) Python class in __main__ may trigger AssertionError

flykobe cheng created SPARK-7891:
------------------------------------

             Summary: Python class in __main__ may trigger AssertionError
                 Key: SPARK-7891
                 URL: https://issues.apache.org/jira/browse/SPARK-7891
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.2.0
         Environment: Linux, Python 2.7.3
pickled by Python pickle Lib
            Reporter: flykobe cheng
            Priority: Minor


Callback functions for spark transformations and actions will be pickled. 
If the callback is instancemethod of __main__ module's class, and the class has more than one instancemethod which using class properties or classmethods, the class will be pickled twice, and 'pickle.memoize' twice, then trigger AssertionError.

Demo code:
class AClass(object):
    _class_var = {'classkey': 'classval', } 

    def main_object_method(self, item):
        logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))

    def main_object_method2(self, item):
        logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))

        
def test_main_object_method(sc):
    obj = AClass()
    res = sc.parallelize(range(4)).map(obj.main_object_method).collect()


if __name__ == '__main__':
    cf = pyspark.SparkConf()
    cf.set('spark.cores.max', 1)

    sc = pyspark.SparkContext(appName = "flykobe_demo_pickle_error", conf = cf)

    test_main_object_method(sc)


Traceback:
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 310, in save_function_tuple
    save(f_globals)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 174, in save_dict
    pickle.Pickler.save_dict(self, obj)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 654, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 686, in _batch_setitems
    save(v)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 468, in save_global
    d),obj=obj)
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 638, in save_reduce
    self.memoize(obj)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 248, in memoize
    assert id(obj) not in self.memo
AssertionError


Problem in Python/Lib/pickle.py:
    def memoize(self, obj):
        """Store an object in the memo."""
        if self.fast:
            return
        assert id(obj) not in self.memo
        memo_len = len(self.memo)
        self.write(self.put(memo_len))
        self.memo[id(obj)] = memo_len, obj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org