You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Abhinav Mishra (JIRA)" <ji...@apache.org> on 2015/08/19 16:08:45 UTC

[jira] [Created] (SPARK-10112) ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another

Abhinav Mishra created SPARK-10112:
--------------------------------------

             Summary: ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another
                 Key: SPARK-10112
                 URL: https://issues.apache.org/jira/browse/SPARK-10112
             Project: Spark
          Issue Type: Bug
          Components: PySpark
         Environment: Ubuntu 14.04.2 LTS
            Reporter: Abhinav Mishra


I have this piece of code which works fine on one machine but when I run this on another machine I get error as - "ValueError: Can only zip with RDD which has the same number of partitions". My code is:

rdd2 = sc.parallelize(list1) 
rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4))
list = rdd3.collect()

Both my rdd's - rdd1 and rdd2, have same number of elements and same number of partition (both have 1 partition) and I tried to use repartition() as well but it does not resolves this issue.

The above code works fine on one machine but throws error on another. I tired to look for some explanations but I couldn't find any specific reason for this behavior. I have spark 1.3 on the machine on which it runs without any error and spark 1.4 on machine on which this error comes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org