You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2010/01/14 20:25:34 UTC

[Pig Wiki] Update of "ProposedProjects" by AlanGates

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "ProposedProjects" page has been changed by AlanGates.
http://wiki.apache.org/pig/ProposedProjects?action=diff&rev1=9&rev2=10

--------------------------------------------------

  = Proposed Pig Projects =
+ The list of proposed Pig projects is now kept on the PigJournal page.
- This page describes projects what we (the committers) would like to see added
- to Pig.  The scale of these projects vary, but they are larger projects,
- usually on the weeks or months scale.  We have not yet filed
- [[https://issues.apache.org/jira/browse/PIG|JIRAs]] for some of these
- because they are still in the vague idea stage.  As they become more concrete,
- [[https://issues.apache.org/jira/browse/PIG|JIRAs]] will be filed for them.
  
- We welcome contributers to take on one of these projects.  If you would like
- to do so, please file a JIRA (if one does not already exist for the project)
- with a proposed solution.  Pig's committers will work with you from there to
- help refine your solution.  Once a solution is agreed upon, you can begin
- implementation.
+ Looking to get involved in Pig?  Excellent.  A great place to start is find a [[http://issues.apache.org/jira/browse/PIG|JIRA]] that interests you and provide a
+ patch for that.  If you are looking for a bigger project to take on, take a look at PigJournal.  Before starting work on a project, it is best to post on the
+ JIRA that you plan on working on it and an outline of the approach you intend to take.  If it does not have a JIRA yet send a mail to
+ [[mailto:pig-dev@hadoop.apache.org|pig-dev]].  This has a couple of
+ advantages.  One, if others want to collaborate with you, it gives them a chance to say so and pitch in.  Two, it lets the committers know what you are working
+ on they can help you through the process.
  
- If you see a project here that you would like to see Pig implement but you are
- not in a position to implement the solution right now, feel free to vote for
- the project.  Add your name to the list of supporters.  This will help
- contributers looking for a project to select one that will benefit many users.
- 
- If you would like to propose a project for Pig, feel free to add to this list.
- If it is a smaller project, or something you plan to begin work on
- immediately, filing a [[https://issues.apache.org/jira/browse/PIG|JIRA]] is a better route.
- 
- || Catagory || Project || JIRA || References || Proposed By || Votes For ||
- || Execution || Pig currently executes scripts by building a pipeline of pre-built operators and running data through those operators in map reduce jobs.  We need to investigate instead have Pig generate java code specific to a job, and then compiling that code and using it to run the map reduce jobs. || || || Many conference attendees || gates ||
- || Language || Currently only LIMIT, DISTINCT, ORDER BY, and FILTER are allowed inside FOREACH.  All operators should be allowed in FOREACH. || || || gates || ||
- || Optimization || Speed up comparison of tuples during shuffle for ORDER BY || [[https://issues.apache.org/jira/browse/PIG-659|659]] || || olgan || ||
- || Optimization || Often in a Pig script that produces a chain of MR jobs, the map phases of 2nd and subsequent jobs very little.  What little they do should be pushed into the proceeding reduce and the map replaced by the identity mapper.  Initial tests showed that the identity mapper was 50% faster than using a Pig mapper (because Pig uses the loader to parse out tuples even if the map itself is empty). || [[https://issues.apache.org/jira/browse/PIG-480|480]] || || olgan || gates ||
- || Optimization || Use hand crafted calls to do string to integer or float conversions.  Initial tests showed these could be done about 8x faster than String.toIntger() and String.toFloat(). || [[https://issues.apache.org/jira/browse/PIG-482|482]] || || olgan || gates ||
- || Optimization || Currently Pig always samples for an ORDER BY to determine how to partition, and then runs another job to do the sort.  For small enough inputs, it should just sort with a single reducer. || [[https://issues.apache.org/jira/browse/PIG-483|483]] || || olgan || ||
- || Optimization || The combiner is not currently used if FILTER is in the FOREACH.  In some cases it could still be used.  || [[https://issues.apache.org/jira/browse/PIG-479|479]] || || olgan || ||
- || Optimization || The combiner is not currently used if LIMIT is in the FOREACH.  ||  || || gates || ||
- || Optimization || Currently when types of data are declared Pig inserts a FOREACH immediately after the LOAD that does the conversions.  These conversions should be delayed until the field is actually used. || [[https://issues.apache.org/jira/browse/PIG-410|410]] || || olgan || gates ||
- || Optimization || The Pig optimizer should be used to determine when fields in a record are no longer needed and put in FOREACH statements to project out the unecessary data as early as possible. || [[https://issues.apache.org/jira/browse/PIG-466|466]] || || olgan || ||
- || Optimization || Change physical operators to pass list of tuples in getNext instead of one tuple at a time. || [[https://issues.apache.org/jira/browse/PIG-688|688]] || || Thejas || ||
-