You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2009/02/13 20:11:39 UTC

[Pig Wiki] Update of "PigUserCookbook" by OlgaN

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigUserCookbook

------------------------------------------------------------------------------
  
  One case where pushing filters up might not be a good idea is if the cost of applying filter is very high and only a small amount of data is filtered out.
  
+ '''Reduce Your Operator Pipeline'''
+ 
+ For clarity of your script, you might choose to split your projects into several steps for instance:
+ 
+ {{{
+ A = load 'data' as (in: map[]);
+ -- get key out of the map
+ B = foreach A generate in#k1 as k1, in#k2 as k2;
+ -- concatenate the keys
+ C = foreach B generate CONCAT(k1, k2);
+ .......
+ }}}
+ 
+ While the example above is easier to read, you might want to consider combining the two foreach statements to improve your query performance:
+ 
+ {{{
+ A = load 'data' as (in: map[]);
+ -- concatenate the keys from the map
+ B = foreach A generate CONCAT(in#k1, in#k2);
+ ....
+ }}}
+ 
+ The same goes for filters.
+ 
  '''Drop Nulls Before a Join'''
  
  This comment only applies to pig on the types branch, as pig 0.1.0 does not have nulls.