You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/08/21 18:59:46 UTC

[jira] [Resolved] (SPARK-10112) ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another

     [ https://issues.apache.org/jira/browse/SPARK-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-10112.
-------------------------------
    Resolution: Cannot Reproduce

I can't reproduce this on Ubuntu or OS X, with the latest Spark master. You still haven't shown what things like list1 are, so I'm closing this until it's clearer that this isn't just a mismatch between what you think you're executing and what you are.

{code}
>>> rdd1 = sc.parallelize([1,2,3], 1)
>>> rdd2 = sc.parallelize([4,5,6], 1)
>>> rdd1.getNumPartitions()
1
>>> rdd2.getNumPartitions()
1
>>> rdd1.zip(rdd2).collect()
...
[(1, 4), (2, 5), (3, 6)]
>>> 
{code}

> ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10112
>                 URL: https://issues.apache.org/jira/browse/SPARK-10112
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>         Environment: Ubuntu 14.04.2 LTS
>            Reporter: Abhinav Mishra
>
> I have this piece of code which works fine on one machine but when I run this on another machine I get error as - "ValueError: Can only zip with RDD which has the same number of partitions". My code is:
> rdd2 = sc.parallelize(list1) 
> assert rdd1. getNumPartitions() == rdd2. getNumPartitions()
> rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4))
> list = rdd3.collect()
> My rdd1 has this structure - [(1,2,3),(4,5,6)....]. My rdd2 has this structure - [1,2,3....]
>  
> Both my rdd's - rdd1 and rdd2, have same number of elements and same number of partition (both have 1 partition) and I tried to use repartition() as well but it does not resolves this issue.
> The above code works fine on one machine but throws error on another. I tired to look for some explanations but I couldn't find any specific reason for this behavior. I have spark 1.3 on the machine on which it runs without any error and spark 1.4 on machine on which this error comes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org