You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Guoqiang Li (JIRA)" <ji...@apache.org> on 2014/09/10 09:13:28 UTC

[jira] [Resolved] (SPARK-3364) Zip equal-length but unequally-partition

     [ https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guoqiang Li resolved SPARK-3364.
--------------------------------
    Resolution: Fixed

> Zip equal-length but unequally-partition
> ----------------------------------------
>
>                 Key: SPARK-3364
>                 URL: https://issues.apache.org/jira/browse/SPARK-3364
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2
>            Reporter: Kevin Jung
>             Fix For: 1.1.0
>
>
> ZippedRDD losts some elements after zipping RDDs with equal numbers of partitions but unequal numbers of elements in their each partitions.
> This can happen when a user creates RDD by sc.textFile(path,partitionNumbers) with physically unbalanced HDFS file.
> {noformat}
> var x = sc.parallelize(1 to 9,3)
> var y = sc.parallelize(Array(1,1,1,1,1,2,2,3,3),3).keyBy(i=>i)
> var z = y.partitionBy(new RangePartitioner(3,y))
> expected
> x.zip(y).count()
> 9
> x.zip(y).collect()
> Array[(Int, (Int, Int))] = Array((1,(1,1)), (2,(1,1)), (3,(1,1)), (4,(1,1)), (5,(1,1)), (6,(2,2)), (7,(2,2)), (8,(3,3)), (9,(3,3)))
> unexpected
> x.zip(z).count()
> 7
> x.zip(z).collect()
> Array[(Int, (Int, Int))] = Array((1,(1,1)), (2,(1,1)), (3,(1,1)), (4,(2,2)), (5,(2,2)), (7,(3,3)), (8,(3,3)))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org