You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/01/22 23:31:20 UTC

[Hadoop Wiki] Trivial Update of "Hive/LanguageManual/SortBy" by ZhengShao

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by ZhengShao:
http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy

New page:
[[TableOfContents]]

== Syntax of Sort By ==
The ''SORT BY'' syntax is similar to the syntax of ''ORDER BY'' in SQL language.

{{{
colOrder: ( ASC | DESC )
sortBy: SORT BY colName colOrder? (',' colName colOrder?)*
query: SELECT expression (',' expression)* FROM src sortBy
}}}

== Difference between Sort By and Order By ==
Most database systems supports ''ORDER BY'', which Hive does not support directly.
Hive supports ''SORT BY'' which sorts the data per reducer.

Basically, the data in each reducer will be sorted according to the order that the user specified.  The following example shows 

{{{
SELECT key, value FROM src SORT BY key ASC, value DESC
}}}

The query had 2 reducers, and the output of each is:
{{{
0   5
0   3
3   6
9   1
}}}

{{{
0   4
0   3
1   1
2   5
}}}


== How to do Order By? ==

We can set the number of reducers to 1, to make sure we have the same result as ''ORDER BY''.

{{{
set mapred.reduce.tasks=1;
SELECT key, value FROM src SORT BY key ASC, value DESC;
}}}