You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Prateek (JIRA)" <ji...@apache.org> on 2018/02/04 01:49:00 UTC

[jira] [Commented] (SPARK-22711) _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from cloudpickle.py

    [ https://issues.apache.org/jira/browse/SPARK-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351601#comment-16351601 ] 

Prateek commented on SPARK-22711:
---------------------------------

Thanks . work around works fine.

I wonder why *setup_environment* method did not take care of this. Is it that each pass of map or flatMap master assigns to worker nodes and worker nodes are kind of refreshed(setup RDD was temporary) or is it that it assigns to different worker node other it did for than setup_environment.

> _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from cloudpickle.py
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22711
>                 URL: https://issues.apache.org/jira/browse/SPARK-22711
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Spark Submit
>    Affects Versions: 2.2.0, 2.2.1
>         Environment: Ubuntu pseudo distributed installation of Spark 2.2.0
>            Reporter: Prateek
>            Priority: Major
>         Attachments: Jira_Spark_minimized_code.py
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> When I submit a Pyspark program with spark-submit command this error is thrown.
> It happens when for code like below
> RDD2 = RDD1.map(lambda m: function_x(m)).reduceByKey(lambda c,v :c+v)
> or 
> RDD2 = RDD1.flatMap(lambda m: function_x(m)).reduceByKey(lambda c,v :c+v)
> or
> RDD2 = RDD1.flatMap(lambda m: function_x(m)).reduce(lambda c,v :c+v)
> Traceback (most recent call last):
>   File "/home/prateek/Project/textrank.py", line 299, in <module>
>     summaryRDD = sentenceTokensReduceRDD.map(lambda m: get_summary(m)).reduceByKey(lambda c,v :c+v)
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1608, in reduceByKey
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1846, in combineByKey
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1783, in partitionBy
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2388, in _wrap_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2374, in _prepare_for_python_RDD
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 460, in dumps
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 704, in dumps
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 148, in dump
>   File "/usr/lib/python3.5/pickle.py", line 408, in dump
>     self.save(obj)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 740, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
>     save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
>     save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
>     save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 797, in _batch_appends
>     save(tmp[0])
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 841, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 841, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 520, in save
>     self.save_reduce(obj=obj, *rv)
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 565, in save_reduce
> _pickle.PicklingError: args[0] from __newobj__ args has the wrong class
> I tried replacing the cloudpickle code from GitHub , but that started giving error copy_reg not defined and copyreg not defined .(for both python 2.7 and 3.5)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org