You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2010/03/31 23:01:00 UTC

[Hadoop Wiki] Update of "Hive/LanguageManual/Joins" by NamitJain

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/Joins" page has been changed by NamitJain.
http://wiki.apache.org/hadoop/Hive/LanguageManual/Joins?action=diff&rev1=11&rev2=12

--------------------------------------------------

    SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON (c.key = b.key2)
  }}}
    there are two map/reduce jobs involved in computing the join. The first of these joins a with b and buffers the values of a while streaming the values of b in the reducers. The second of one of these jobs buffers the results of the first join while streaming the values of c through the reducers.
+  * In every map/reduce stage of the join, the table to be streamed can be specified via a hint. e.g. in
+ {{{
+   SELECT /*+ STREAMTABLE(a) */ a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON (c.key = b.key1)
+ }}}
+   all the three tables are joined in a single map/reduce job and the values for a particular value of the key for tables b and c are buffered in the memory in the reducers. Then for each row retrieved from a, the join is computed with the buffered rows. 
   * LEFT, RIGHT, and FULL OUTER joins exist in order to provide more control over ON clauses for which there is no match. For example, this query:
  {{{
    SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key)