You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 萝卜丝炒饭 <14...@qq.com> on 2017/02/14 11:41:55 UTC
how to fix the order of data
HI all,
the belowing is my test code. I found the output of val input is different. how do i fix the order please?
scala> val input = sc.parallelize( Array(1,2,3))
input: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[13] at parallelize at <console>:24
scala> input.foreach(print)
132
scala> input.foreach(print)
213
scala> input.foreach(print)
312
Re: how to fix the order of data
Posted by 萝卜丝炒饭 <14...@qq.com>.
IT works well now, thanks
---Original---
From: "Sam Elamin"<hu...@gmail.com>
Date: 2017/2/14 19:54:36
To: "萝卜丝炒饭"<14...@qq.com>;
Cc: "user"<us...@spark.apache.org>;
Subject: Re: how to fix the order of data
Its because you are just printing on the rdd
You can sort the df like below
 input.toDF().sort().collect()
or if you do not want to convert to a dataframe you can use the sort by sortByKey([ascending], [numTasks])
Regards
Sam
On Tue, Feb 14, 2017 at 11:41 AM, 萝卜丝炒饭 <14...@qq.com> wrote:
   HI  all,
the belowing is my test code. I found the output of val input is different. how do i fix the order please?
scala> val input = sc.parallelize( Array(1,2,3))
input: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[13] at parallelize at <console>:24
scala> input.foreach(print)
132
scala> input.foreach(print)
213
scala> input.foreach(print)
312
Re: how to fix the order of data
Posted by Sam Elamin <hu...@gmail.com>.
Its because you are just printing on the rdd
You can sort the df like below
input.toDF().sort().collect()
or if you do not want to convert to a dataframe you can use the sort by
*sortByKey*([*ascending*], [*numTasks*])
Regards
Sam
On Tue, Feb 14, 2017 at 11:41 AM, 萝卜丝炒饭 <14...@qq.com> wrote:
> HI all,
> the belowing is my test code. I found the output of val
> input is different. how do i fix the order please?
>
> scala> val input = sc.parallelize( Array(1,2,3))
> input: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[13] at
> parallelize at <console>:24
>
> scala> input.foreach(print)
> 132
> scala> input.foreach(print)
> 213
> scala> input.foreach(print)
> 312