You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/01/03 12:31:00 UTC

[jira] [Commented] (SPARK-22711) _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from cloudpickle.py

    [ https://issues.apache.org/jira/browse/SPARK-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309582#comment-16309582 ] 

Hyukjin Kwon commented on SPARK-22711:
--------------------------------------

Thanks [~PrateekRM]. However, seems it's not quite self-contained. For example, sys module is missing and setup_environment functions it missing too. Could you double check and make it self-runnable?

> _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from cloudpickle.py
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22711
>                 URL: https://issues.apache.org/jira/browse/SPARK-22711
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Spark Submit
>    Affects Versions: 2.2.0, 2.2.1
>         Environment: Ubuntu pseudo distributed installation of Spark 2.2.0
>            Reporter: Prateek
>         Attachments: Jira_Spark_minimized_code.py
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> When I submit a Pyspark program with spark-submit command this error is thrown.
> It happens when for code like below
> RDD2 = RDD1.map(lambda m: function_x(m)).reduceByKey(lambda c,v :c+v)
> or 
> RDD2 = RDD1.flatMap(lambda m: function_x(m)).reduceByKey(lambda c,v :c+v)
> or
> RDD2 = RDD1.flatMap(lambda m: function_x(m)).reduce(lambda c,v :c+v)
> Traceback (most recent call last):
>   File "/home/prateek/Project/textrank.py", line 299, in <module>
>     summaryRDD = sentenceTokensReduceRDD.map(lambda m: get_summary(m)).reduceByKey(lambda c,v :c+v)
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1608, in reduceByKey
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1846, in combineByKey
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1783, in partitionBy
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, in _jrdd
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2388, in _wrap_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2374, in _prepare_for_python_RDD
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 460, in dumps
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 704, in dumps
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 148, in dump
>   File "/usr/lib/python3.5/pickle.py", line 408, in dump
>     self.save(obj)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 740, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
>     save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
>     save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
>     save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 797, in _batch_appends
>     save(tmp[0])
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 841, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 841, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 520, in save
>     self.save_reduce(obj=obj, *rv)
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 565, in save_reduce
> _pickle.PicklingError: args[0] from __newobj__ args has the wrong class
> I tried replacing the cloudpickle code from GitHub , but that started giving error copy_reg not defined and copyreg not defined .(for both python 2.7 and 3.5)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org