You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2011/01/10 18:23:10 UTC

[Pig Wiki] Update of "PigJournal" by AlanGates

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "PigJournal" page has been changed by AlanGates.
http://wiki.apache.org/pig/PigJournal?action=diff&rev1=13&rev2=14

--------------------------------------------------

  || Better Parser and Scanner Technology        || [[https://issues.apache.org/jira/browse/PIG-1618|PIG-1618]] || ||
  || Clarify Pig Latin Semantics                 || many || ||
  || Extending Pig to Include Branching, Looping, and Functions || TuringCompletePig || ||
+ || Move Piggybank out of github || https://github.com/wilbur/Piggybank || Currently Pig hosts Piggybank (our repository of user contributed UDFs) as part of our contrib.  This is not ideal for a couple of reasons.  One, it means those who wish to share their UDFs have to go through the rigor of the patch process.  Two, since contrib is tied to releases of the main product, there is no way for users to share functions for older versions or quickly disseminate their new functions.  If Piggybank were instead more similar to CPAN than users could upload their own packages with little assistance from Pig committers and specify what versions of Pig the function is for.  This could be done via hosting site such as github. ||
+ 
+ 
  
  
  == Proposed Future Work ==
@@ -87, +90 @@

  
  '''Dependency:''' Map Reduce Optimizer 
  
- '''References:''' [[https://issues.apache.org/jira/browse/PIG-479|PIG-479]]
+ '''References:'''
  
  '''Estimated Development Effort:'''  small
  
@@ -258, +261 @@

  '''Estimated Development Effort:'''  medium
  
  === Agreed Work, Unknown Approach ===
+ ==== Make Use of HBase ====
+ Pig can do bulk reads and writes from HBase.  But it cannot use HBase in operators like a hash join.  We need operators that make use of HBase where it makes sense.  Also, we may need to provide support so that UDFs can efficiently access HBase themselves.
+ 
+ '''Category:'''  Integration, Performance
+ 
+ '''Dependency:''' 
+ 
+ '''References:'''
+ 
+ '''Estimated Development Effort:'''  medium
+ 
+ ==== Runtime Optimizations ====
+ Currently Pig does all of its optimizations up front before beginning any execution.  In a multi-job pipeline information will be learned in initial jobs that could be used in later jobs to make optimization decisions.  For example, a join later in the pipeline may turn out to have inputs of a size such that fragment replicate makes sense as a join strategy.  Being able to rewrite the plan midway through the execution will provide the ability to optimize for these types of situations.
+ 
+ '''Category:'''  Performance
+ 
+ '''Dependency:''' 
+ 
+ '''References:'''
+ 
+ '''Estimated Development Effort:'''  large
+ 
  ==== Support Append in Pig ====
  Appending to HDFS files is supported in Hadoop 0.21.  None of Pig's standard load functions support append.  We need to decide if append is added to 
  the language itself (is there an APPEND modifier to the STORE command?) or if each store function needs to decide how to indicate or allow appending on its own.  !PigStorage
@@ -266, +291 @@

  '''Category:'''  New Functionality
  
  '''Dependency:''' Hadoop 0.21 or later
- 
- '''References:'''
- 
- '''Estimated Development Effort:'''  small
- 
- 
- ==== Move Piggybank out of Contrib ====
- Currently Pig hosts Piggybank (our repository of user contributed UDFs) as part of our contrib.  This is not ideal for a couple of reasons.  One, it means those who
- wish to share their UDFs have to go through the rigor of the patch process.  Two, since contrib is tied to releases of the main product, there is no way for users
- to share functions for older versions or quickly disseminate their new functions.  If Piggybank were instead more similar to CPAN than users could upload their own
- packages with little assistance from Pig committers and specify what versions of Pig the function is for.  This could be done via hosting site such as github.
- 
- '''Category:'''  Usability
- 
- '''Dependency:'''
  
  '''References:'''