You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/09/07 06:48:00 UTC

[jira] [Commented] (SPARK-32809) RDD different partitions cause didderent results

    [ https://issues.apache.org/jira/browse/SPARK-32809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191485#comment-17191485 ] 

Hyukjin Kwon commented on SPARK-32809:
--------------------------------------

The results is correct. For {{local[1]}}, there's a single partition. So {{seqOp}} will be used for {{("apple", 10), ("apple", 15)}}. When {{local[3]}} the partitions are three. {{combOp}} will be used to combine each partition.

> RDD  different partitions cause didderent results 
> --------------------------------------------------
>
>                 Key: SPARK-32809
>                 URL: https://issues.apache.org/jira/browse/SPARK-32809
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>         Environment: spark2.2.0 ,scala 2.11.8 , hadoop-client2.6.0
>            Reporter: zhangchenglong
>            Priority: Major
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> {code}
> class Exec3 {
>   private val exec: SparkConf = new SparkConf().setMaster("local[1]").setAppName("exec3")
>   private val context = new SparkContext(exec)
>   context.setCheckpointDir("checkPoint")
>  
>   /**
>    * get total number by key 
>    * in this project desired results are ("apple",25) ("huwei"，20)
>    * but in fact i get ("apple",150) ("huawei"，20)
>    *   when i change it to local[3] the result is correct
>    *  i want to know   which cause it and how to slove it 
>    */
>   @Test
>   def testError(): Unit ={
>     val rdd = context.parallelize(Seq(("apple", 10), ("apple", 15), ("huawei", 20)))
>     rdd.aggregateByKey(1.0)(
>       seqOp = (zero, price) => price * zero,
>       combOp = (curr, agg) => curr + agg).collect().foreach(println(_))
>     context.stop()
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org