You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 萝卜丝炒饭 <14...@qq.com> on 2017/02/14 11:41:55 UTC

how to fix the order of data

HI  all,
the belowing is my test code. I found the output of val input is different. how do i fix the order please?

scala> val input = sc.parallelize( Array(1,2,3))
input: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[13] at parallelize at <console>:24

scala> input.foreach(print)
132
scala> input.foreach(print)
213
scala> input.foreach(print)
312

Re: how to fix the order of data

Posted by 萝卜丝炒饭 <14...@qq.com>.
IT works well now, thanks

---Original---
From: "Sam Elamin"<hu...@gmail.com>
Date: 2017/2/14 19:54:36
To: "萝卜丝炒饭"<14...@qq.com>;
Cc: "user"<us...@spark.apache.org>;
Subject: Re: how to fix the order of data


Its because you are just printing on the rdd

You can sort the df like below


 
&#xA0;input.toDF().sort().collect()




or if you do not want to convert to a dataframe you can use the sort by&#xA0;sortByKey([ascending], [numTasks])




Regards

Sam












On Tue, Feb 14, 2017 at 11:41 AM, 萝卜丝炒饭 <14...@qq.com> wrote:
&#xA0;&#xA0;&#xA0;HI&#xA0;&#xA0;all,
the&#xA0;belowing&#xA0;is&#xA0;my&#xA0;test&#xA0;code.&#xA0;I&#xA0;found&#xA0;the&#xA0;output&#xA0;of&#xA0;val&#xA0;input&#xA0;is&#xA0;different.&#xA0;how&#xA0;do&#xA0;i&#xA0;fix&#xA0;the&#xA0;order&#xA0;please?

scala>&#xA0;val&#xA0;input&#xA0;=&#xA0;sc.parallelize(&#xA0;Array(1,2,3))
input:&#xA0;org.apache.spark.rdd.RDD[Int]&#xA0;=&#xA0;ParallelCollectionRDD[13]&#xA0;at&#xA0;parallelize&#xA0;at&#xA0;<console>:24

scala>&#xA0;input.foreach(print)
132
scala>&#xA0;input.foreach(print)
213
scala>&#xA0;input.foreach(print)
312

Re: how to fix the order of data

Posted by Sam Elamin <hu...@gmail.com>.
Its because you are just printing on the rdd

You can sort the df like below

 input.toDF().sort().collect()


or if you do not want to convert to a dataframe you can use the sort by
*sortByKey*([*ascending*], [*numTasks*])


Regards

Sam





On Tue, Feb 14, 2017 at 11:41 AM, 萝卜丝炒饭 <14...@qq.com> wrote:

>    HI  all,
> the belowing is my test code. I found the output of val
> input is different. how do i fix the order please?
>
> scala> val input = sc.parallelize( Array(1,2,3))
> input: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[13] at
> parallelize at <console>:24
>
> scala> input.foreach(print)
> 132
> scala> input.foreach(print)
> 213
> scala> input.foreach(print)
> 312