You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2008/05/05 23:08:33 UTC

[Pig Wiki] Update of "NestedLogicalPlan" by Shravan Narayanamurthy

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by Shravan Narayanamurthy:
http://wiki.apache.org/pig/NestedLogicalPlan

------------------------------------------------------------------------------
  }}}
  NOTE: In the real implementation, separating inner plan for each output field might be simpler to do. For example "GENERATE $1+$2, ($1+$2)*5" can be a plan for $1+$2 and a plan for ($1+$2)*5 so that we don't have to care about merging them all. /!\ Open question /!\
  
+ [shrav] Pig already kind of does what you are saying here; just that it does it implicitly. The loadTuple is infact what happens when a nested plan is processed. I guess the way to extend the language would be to just allow all the operators that we allow outside a nested plan inside of it. In fact, the execution side, that is the Physical side, already supports it. Just that we need to make appropriate parser changes and the hard thing would be to do type checking and parsing itself.
  ==== More examples ====
  
  Given GENERATE: Tuple -> Tuple
@@ -141, +142 @@

              StoreTuple
  }}}
  Diagram B1
- 
+ [shrav] Are you saying that pig does not support this now?
  
  This looks similar to a common relational plan:-
  
@@ -233, +234 @@

  JOIN :      This can be constructed by COGroup
  }}}
  
+ GENERATE looks oversimplified to me. First the input need not just be a tuple, it can be a combination of tuple and bag and flatten in that case actually produces the cartesian product.
+ ALso in FOREACH, the function inside can be a full plan. So it can process bags as well and not just tuples.
+ 
  == Problems with current Operators (5-May-2008) ==
  
  ==== LOGenerate ====
@@ -255, +259 @@

  }}}
  Seems like LOGenerate is not needed at all. GENERATE is more like just a part of FOREACH syntax (analogous to BY and FILTER)
  
+ [shrav] I don't agree with this. In fact it is the other way round. The Foreach is dummy while the generate does all the work. The foreach just takes each input and uses the generate specification to process the input tuple. The generate spec is the one that defines the transformation.
  ==== LOProject ====
  This operator is only for mapping input tuple to output tuple (eg. {A,B,C,D,E} ==> {A,C,D} ). Given the fact that we allow users to have fields in COGROUP, FILTER, FOREACH as expressions, LOProject then becomes just a special case when users merely specify direct mapping. Since we have agreed upon the concept of inner plans, I think LOProject is not needed.
+ [shrav]Project is a consistent way implementing these fields that the user mentions without letting the user bother about all the conversions he might need to do if we just pass the raw tuple to him. Also you can only project out one field and not multiple fields.