You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Niranda Perera <ni...@gmail.com> on 2015/09/03 07:19:59 UTC

Spark SQL sort by and collect by in multiple partitions

Hi all,

I have been using sort by and order by in spark sql and I observed the
following

when using SORT BY and collect results, the results are getting sorted
partition by partition.
example:
if we have 1, 2, ... , 12 and 4 partitions and I want to sort it in
descending order,
partition 0 (p0) would have 12, 8, 4
p1 = 11, 7, 3
p2 = 10, 6, 2
p3 = 9, 5, 1

so collect() would return 12, 8, 4, 11, 7, 3, 10, 6, 2, 9, 5, 1

BUT when I use ORDER BY and collect results
p0 = 12, 11, 10
p1 =  9, 8, 7
.....
so collect() would return 12, 11, .., 1 which is the desirable result.

is this the intended behavior of SORT BY and ORDER BY or is there something
I'm missing?

cheers

-- 
Niranda
@n1r44 <https://twitter.com/N1R44>
https://pythagoreanscript.wordpress.com/

Re: Spark SQL sort by and collect by in multiple partitions

Posted by Vishnu Kumar <mr...@gmail.com>.
Hi,

Yes this is intended behavior. "ORDER BY" guarantees the total order in
output while  "SORT BY" guarantees the order within a partition.


Vishnu

On Thu, Sep 3, 2015 at 10:49 AM, Niranda Perera <ni...@gmail.com>
wrote:

> Hi all,
>
> I have been using sort by and order by in spark sql and I observed the
> following
>
> when using SORT BY and collect results, the results are getting sorted
> partition by partition.
> example:
> if we have 1, 2, ... , 12 and 4 partitions and I want to sort it in
> descending order,
> partition 0 (p0) would have 12, 8, 4
> p1 = 11, 7, 3
> p2 = 10, 6, 2
> p3 = 9, 5, 1
>
> so collect() would return 12, 8, 4, 11, 7, 3, 10, 6, 2, 9, 5, 1
>
> BUT when I use ORDER BY and collect results
> p0 = 12, 11, 10
> p1 =  9, 8, 7
> .....
> so collect() would return 12, 11, .., 1 which is the desirable result.
>
> is this the intended behavior of SORT BY and ORDER BY or is there
> something I'm missing?
>
> cheers
>
> --
> Niranda
> @n1r44 <https://twitter.com/N1R44>
> https://pythagoreanscript.wordpress.com/
>