You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2009/01/07 02:51:55 UTC

[Pig Wiki] Update of "PigFRJoin" by PradeepKamath

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by PradeepKamath:
http://wiki.apache.org/pig/PigFRJoin

------------------------------------------------------------------------------
  = Fragment Replicate Join =
  Fragment Replicate Join(FRJ) is useful when we want a join between a huge table and a very small table (fitting in memory small) and the join doesn't expand the data by much. The idea is to distribute the processing of the huge files by fragmenting it and replicating the small file to all machines receiving a fragment of the huge file. Because of the availability of the entire small file, the join becomes a trivial task without needing any break in the pipeline.
+ 
+ NOTE: In the initial version of the implementation, the first input in the Join statement will be considered to be the "fragment" input and the other inputs are considered to be the "replicate" inputs.
  
  == Performance ==
  The following is a set of parameters that we can alter to compare the performance of the different types of join algorithms: