You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Guoqiang Li (JIRA)" <ji...@apache.org> on 2014/09/10 09:13:28 UTC
[jira] [Resolved] (SPARK-3364) Zip equal-length but
unequally-partition
[ https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guoqiang Li resolved SPARK-3364.
--------------------------------
Resolution: Fixed
> Zip equal-length but unequally-partition
> ----------------------------------------
>
> Key: SPARK-3364
> URL: https://issues.apache.org/jira/browse/SPARK-3364
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.0.2
> Reporter: Kevin Jung
> Fix For: 1.1.0
>
>
> ZippedRDD losts some elements after zipping RDDs with equal numbers of partitions but unequal numbers of elements in their each partitions.
> This can happen when a user creates RDD by sc.textFile(path,partitionNumbers) with physically unbalanced HDFS file.
> {noformat}
> var x = sc.parallelize(1 to 9,3)
> var y = sc.parallelize(Array(1,1,1,1,1,2,2,3,3),3).keyBy(i=>i)
> var z = y.partitionBy(new RangePartitioner(3,y))
> expected
> x.zip(y).count()
> 9
> x.zip(y).collect()
> Array[(Int, (Int, Int))] = Array((1,(1,1)), (2,(1,1)), (3,(1,1)), (4,(1,1)), (5,(1,1)), (6,(2,2)), (7,(2,2)), (8,(3,3)), (9,(3,3)))
> unexpected
> x.zip(z).count()
> 7
> x.zip(z).collect()
> Array[(Int, (Int, Int))] = Array((1,(1,1)), (2,(1,1)), (3,(1,1)), (4,(2,2)), (5,(2,2)), (7,(3,3)), (8,(3,3)))
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org