You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2009/05/07 20:21:29 UTC
[Pig Wiki] Trivial Update of "ProposedProjects" by OlgaN

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ProposedProjects

------------------------------------------------------------------------------
  || Execution || Pig currently executes scripts by building a pipeline of pre-built operators and running data through those operators in map reduce jobs.  We need to investigate instead have Pig generate java code specific to a job, and then compiling that code and using it to run the map reduce jobs. || || || Many conference attendees || gates ||
  || Language || Currently only DISTINCT, ORDER BY, and FILTER are allowed inside FOREACH.  All operators should be allowed in FOREACH. (Limit is being worked on [https://issues.apache.org/jira/browse/PIG-741 741] || || || gates || ||
  || Optimization || Speed up comparison of tuples during shuffle for ORDER BY || [https://issues.apache.org/jira/browse/PIG-659 659] || || olgan || ||
- || Optimization || Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened.  It can instead work like join does for the last input in the join. || || || gates || ||
+ || Optimization || Order by should be changed to not use POPackage to put all of the tuples in a bag on the reduce side, as the bag is just immediately flattened.  It can instead work like join does for the last input in the join. || [https://issues.apache.org/jira/browse/PIG-802 802] || || gates || olgan ||
  || Optimization || Often in a Pig script that produces a chain of MR jobs, the map phases of 2nd and subsequent jobs very little.  What little they do should be pushed into the proceeding reduce and the map replaced by the identity mapper.  Initial tests showed that the identity mapper was 50% faster than using a Pig mapper (because Pig uses the loader to parse out tuples even if the map itself is empty). || [https://issues.apache.org/jira/browse/PIG-480 480] || || olgan || gates ||
  || Optimization || Use hand crafted calls to do string to integer or float conversions.  Initial tests showed these could be done about 8x faster than String.toIntger() and String.toFloat(). || [https://issues.apache.org/jira/browse/PIG-482 482] || || olgan || gates ||
  || Optimization || Currently Pig always samples for and ORDER BY to determine how to partition, and then runs another job to do the sort.  For small enough inputs, it should just sort with a single reducer. || [https://issues.apache.org/jira/browse/PIG-483 483] || || olgan || ||