You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/08/19 16:12:45 UTC

[jira] [Commented] (SPARK-10112) ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another

    [ https://issues.apache.org/jira/browse/SPARK-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703071#comment-14703071 ] 

Sean Owen commented on SPARK-10112:
-----------------------------------

I don't think you've demonstrated that they have the same number of partitions. We don't see what rdd1 is either. Evidently they do not have the same number of partitions. This is best as a question on user@, and if you have a clear reproduction, then open a JIRA.

> ValueError: Can only zip with RDD which has the same number of partitions on one machine but not on another
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10112
>                 URL: https://issues.apache.org/jira/browse/SPARK-10112
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>         Environment: Ubuntu 14.04.2 LTS
>            Reporter: Abhinav Mishra
>
> I have this piece of code which works fine on one machine but when I run this on another machine I get error as - "ValueError: Can only zip with RDD which has the same number of partitions". My code is:
> rdd2 = sc.parallelize(list1) 
> rdd3 = rdd1.zip(rdd2).map(lambda ((x1,x2,x3,x4), y): (y,x2, x3, x4))
> list = rdd3.collect()
> Both my rdd's - rdd1 and rdd2, have same number of elements and same number of partition (both have 1 partition) and I tried to use repartition() as well but it does not resolves this issue.
> The above code works fine on one machine but throws error on another. I tired to look for some explanations but I couldn't find any specific reason for this behavior. I have spark 1.3 on the machine on which it runs without any error and spark 1.4 on machine on which this error comes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org