You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "flykobe cheng (JIRA)" <ji...@apache.org> on 2015/05/27 11:34:17 UTC
[jira] [Created] (SPARK-7891) Python class in __main__ may trigger
AssertionError
flykobe cheng created SPARK-7891:
------------------------------------
Summary: Python class in __main__ may trigger AssertionError
Key: SPARK-7891
URL: https://issues.apache.org/jira/browse/SPARK-7891
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.2.0
Environment: Linux, Python 2.7.3
pickled by Python pickle Lib
Reporter: flykobe cheng
Priority: Minor
Callback functions for spark transformations and actions will be pickled.
If the callback is instancemethod of __main__ module's class, and the class has more than one instancemethod which using class properties or classmethods, the class will be pickled twice, and 'pickle.memoize' twice, then trigger AssertionError.
Demo code:
class AClass(object):
_class_var = {'classkey': 'classval', }
def main_object_method(self, item):
logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))
def main_object_method2(self, item):
logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))
def test_main_object_method(sc):
obj = AClass()
res = sc.parallelize(range(4)).map(obj.main_object_method).collect()
if __name__ == '__main__':
cf = pyspark.SparkConf()
cf.set('spark.cores.max', 1)
sc = pyspark.SparkContext(appName = "flykobe_demo_pickle_error", conf = cf)
test_main_object_method(sc)
Traceback:
File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 310, in save_function_tuple
save(f_globals)
File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
f(self, obj) # Call unbound method with explicit self
File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 174, in save_dict
pickle.Pickler.save_dict(self, obj)
File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 654, in save_dict
self._batch_setitems(obj.iteritems())
File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 686, in _batch_setitems
save(v)
File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
f(self, obj) # Call unbound method with explicit self
File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 468, in save_global
d),obj=obj)
File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py", line 638, in save_reduce
self.memoize(obj)
File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 248, in memoize
assert id(obj) not in self.memo
AssertionError
Problem in Python/Lib/pickle.py:
def memoize(self, obj):
"""Store an object in the memo."""
if self.fast:
return
assert id(obj) not in self.memo
memo_len = len(self.memo)
self.write(self.put(memo_len))
self.memo[id(obj)] = memo_len, obj
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org