You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2009/08/13 22:50:28 UTC

[Pig Wiki] Update of "PigMergeJoin" by PradeepKamath

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by PradeepKamath:
http://wiki.apache.org/pig/PigMergeJoin

------------------------------------------------------------------------------
  at.  This sampling will be done in an initial map only job.  A second MR job will then be initiated, with the left input as its input.  Each map will use the index to 
   seek to the appropriate record in the right input and begin doing the join.
  
- == Details ==
+ == Pre conditions for merge join ==
+ In the first release merge join will only work under following conditions:
+    * Both inputs are sorted in *ascending* order of join keys
+    * The merge join only has two inputs
+    * Only inner join will be supported
+    * Between the load of the sorted input and the merge join statement there can only be filter statements and foreach statement where the foreach statement should meet the following conditions:
+       * There should be no UDFs in the foreach statement
+       * The foreach statement should not change the position of the join keys
+       * There should not transformation on the join keys which will change the sort order
+    * In local mode, merge join will fall back to regular join
+ 
+ == Implementation Details ==
  === Logical Plan ===
  In the logical plan, use of this join will be recorded in !LOJoin (similar to the way fragment-replicate join and skew join are).  (The work to convert FR Join and Skew
  join to use a common LOJoin is not yet done; we shold coordinate work on this join with the work on the skew join to avoid duplicating effort.)