You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/02/07 20:25:47 UTC
[Hadoop Wiki] Update of "Hive/LanguageManual/SortBy" by AlexSmith
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by AlexSmith:
http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy
The comment on the change is:
adds example for numeric sorting
------------------------------------------------------------------------------
}}}
- === How to do Order By? ===
+ === Simulating Order By ===
We can set the number of reducers to 1, to make sure we have the same result as ''ORDER BY''.
@@ -50, +50 @@
}}}
This sometimes will make the reducer a performance bottleneck. A lot of cases the user only wants to see the top N rows where N is a small number. In this case, we can use LIMIT clause. We don't have an example here but users are encouraged to provide one.
+
+ === Setting Types for Sort By ===
+
+ After a transform, variable types are generally considered to be strings, meaning that numeric data will be sorted lexicographically. To overcome this, a second SELECT statement with casts can be used before using SORT BY.
+
+ {{{
+ FROM (FROM (FROM src
+ SELECT TRANSFORM(value)
+ USING 'mapper'
+ AS value, count) mapped
+ SELECT cast(value as double) AS value, cast(count as int) AS count
+ SORT BY value, count) sorted
+ SELECT TRANSFORM(value, count)
+ USING 'reducer'
+ AS whatever
+ }}}
== Syntax of Cluster By and Distribute By ==