You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Abhinav Mishra <am...@tidemark.com> on 2015/08/19 16:52:44 UTC

ValueError: Can only zip with RDD which has the same number of partitions error on one machine but not on another

Hi,

I have this piece of code which works fine on one machine but when I run
this on another machine I get error as - "ValueError: Can only zip with RDD
which has the same number of partitions". My code is:

rdd2 = sc.parallelize(list1)
rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4))
list = rdd3.collect()
assert rdd1. getNumPartitions() == rdd2. getNumPartitions()

My rdd1 has this structure - [(1,2,3),(4,5,6)....]. My rdd2 has this
structure - [1,2,3....]

Both my rdd's - rdd1 and rdd2, have same number of elements and same number
of partition (both have 1 partition) and I tried to use repartition() as
well but it does not resolves this issue.

The above code works fine on one machine but throws error on another. I
tired to look for some explanations but I couldn't find any specific reason
for this behavior. I have spark 1.3 on the machine on which it runs without
any error and spark 1.4 on machine on which this error comes.

Regards,

*Abhinav Mishra *