You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "flykobe cheng (JIRA)" <ji...@apache.org> on 2015/05/27 11:40:18 UTC

[jira] [Closed] (SPARK-7892) Python class in __main__ may trigger AssertionError

     [ https://issues.apache.org/jira/browse/SPARK-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

flykobe cheng closed SPARK-7892.
--------------------------------
    Resolution: Duplicate

> Python class in __main__ may trigger AssertionError
> ---------------------------------------------------
>
>                 Key: SPARK-7892
>                 URL: https://issues.apache.org/jira/browse/SPARK-7892
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.2.0
>         Environment: Linux, Python 2.7.3
> pickled by Python pickle Lib
>            Reporter: flykobe cheng
>            Priority: Minor
>
> Callback functions for spark transformations and actions will be pickled. 
> If the callback is instancemethod of __main__ module's class, and the class has more than one instancemethod which using class properties or classmethods, the class will be pickled twice, and 'pickle.memoize' twice, then trigger AssertionError.
> Demo code:
> class AClass(object):
>     _class_var = {'classkey': 'classval', } 
>     def main_object_method(self, item):
>         logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))
>     def main_object_method2(self, item):
>         logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))
>         
> def test_main_object_method(sc):
>     obj = AClass()
>     res = sc.parallelize(range(4)).map(obj.main_object_method).collect()
> if __name__ == '__main__':
>     cf = pyspark.SparkConf()
>     cf.set('spark.cores.max', 1)
>     sc = pyspark.SparkContext(appName = "flykobe_demo_pickle_error", conf = cf)
>     test_main_object_method(sc)
> Traceback:
>   File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 310, in save_function_tuple
>     save(f_globals)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 174, in save_dict
>     pickle.Pickler.save_dict(self, obj)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 654, in save_dict
>     self._batch_setitems(obj.iteritems())
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 686, in _batch_setitems
>     save(v)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 468, in save_global
>     d),obj=obj)
>   File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 638, in save_reduce
>     self.memoize(obj)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 248, in memoize
>     assert id(obj) not in self.memo 
> AssertionError
> Problem in Python/Lib/pickle.py:
>     def memoize(self, obj):
>         """Store an object in the memo."""
>         if self.fast:
>             return
>         assert id(obj) not in self.memo
>         memo_len = len(self.memo)
>         self.write(self.put(memo_len))
>         self.memo[id(obj)] = memo_len, obj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org