You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Abhinav Mishra (JIRA)" <ji...@apache.org> on 2015/08/19 16:08:45 UTC
[jira] [Created] (SPARK-10112) ValueError: Can only zip with RDD
which has the same number of partitions on one machine but not on another
Abhinav Mishra created SPARK-10112:
--------------------------------------
Summary: ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another
Key: SPARK-10112
URL: https://issues.apache.org/jira/browse/SPARK-10112
Project: Spark
Issue Type: Bug
Components: PySpark
Environment: Ubuntu 14.04.2 LTS
Reporter: Abhinav Mishra
I have this piece of code which works fine on one machine but when I run this on another machine I get error as - "ValueError: Can only zip with RDD which has the same number of partitions". My code is:
rdd2 = sc.parallelize(list1)
rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4))
list = rdd3.collect()
Both my rdd's - rdd1 and rdd2, have same number of elements and same number of partition (both have 1 partition) and I tried to use repartition() as well but it does not resolves this issue.
The above code works fine on one machine but throws error on another. I tired to look for some explanations but I couldn't find any specific reason for this behavior. I have spark 1.3 on the machine on which it runs without any error and spark 1.4 on machine on which this error comes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org