You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/02/07 20:25:47 UTC

[Hadoop Wiki] Update of "Hive/LanguageManual/SortBy" by AlexSmith

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by AlexSmith:
http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy

The comment on the change is:
adds example for numeric sorting

------------------------------------------------------------------------------
  }}}
  
  
- === How to do Order By? ===
+ === Simulating Order By ===
  
  We can set the number of reducers to 1, to make sure we have the same result as ''ORDER BY''.
  
@@ -50, +50 @@

  }}}
  
  This sometimes will make the reducer a performance bottleneck.  A lot of cases the user only wants to see the top N rows where N is a small number.  In this case, we can use LIMIT clause.  We don't have an example here but users are encouraged to provide one.
+ 
+ === Setting Types for Sort By ===
+ 
+ After a transform, variable types are generally considered to be strings, meaning that numeric data will be sorted lexicographically.  To overcome this, a second SELECT statement with casts can be used before using SORT BY.
+ 
+ {{{
+ FROM (FROM (FROM src
+             SELECT TRANSFORM(value)
+             USING 'mapper'
+             AS value, count) mapped
+       SELECT cast(value as double) AS value, cast(count as int) AS count
+       SORT BY value, count) sorted
+ SELECT TRANSFORM(value, count)
+ USING 'reducer'
+ AS whatever
+ }}}
  
  == Syntax of Cluster By and Distribute By ==