You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2011/03/24 22:19:23 UTC

[Pig Wiki] Update of "PigJournal" by AlanGates

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "PigJournal" page has been changed by AlanGates.
http://wiki.apache.org/pig/PigJournal?action=diff&rev1=14&rev2=15

--------------------------------------------------

  This covers work that is currently being done.  For each entry the main JIRA for the work is referenced.
  
  || Feature                                     || JIRA                                                        || Comments ||
- || Boolean Type                                || [[https://issues.apache.org/jira/browse/PIG-1429|PIG-1429]] || ||
  || Make Illustrate Work                        || [[https://issues.apache.org/jira/browse/PIG-502|PIG-502]], [[https://issues.apache.org/jira/browse/PIG-534|PIG-534]], [[https://issues.apache.org/jira/browse/PIG-903|PIG-903]], [[https://issues.apache.org/jira/browse/PIG-1066|PIG-1066]] || ||
  || Better Parser and Scanner Technology        || [[https://issues.apache.org/jira/browse/PIG-1618|PIG-1618]] || ||
  || Clarify Pig Latin Semantics                 || many || ||
  || Extending Pig to Include Branching, Looping, and Functions || TuringCompletePig || ||
+ || Typed maps                                  || [[https://issues.apache.org/jira/browse/PIG-1876|PIG-1876]] || ||
  || Move Piggybank out of github || https://github.com/wilbur/Piggybank || Currently Pig hosts Piggybank (our repository of user contributed UDFs) as part of our contrib.  This is not ideal for a couple of reasons.  One, it means those who wish to share their UDFs have to go through the rigor of the patch process.  Two, since contrib is tied to releases of the main product, there is no way for users to share functions for older versions or quickly disseminate their new functions.  If Piggybank were instead more similar to CPAN than users could upload their own packages with little assistance from Pig committers and specify what versions of Pig the function is for.  This could be done via hosting site such as github. ||
  
  
@@ -152, +152 @@

  
  '''Estimated Development Effort:'''  medium
  
+ ==== Boolean Type ====
+ The boolean type is only semi supported in Pig.  Filter functions return it, and internally Pig uses it at some points.  But data itself cannot be of boolean
+ type.
+ 
+ '''Category:'''  New Functionality
+ 
+ '''Dependency:'''  Will affect all !LoadCasters, as they will have to provide byteToBoolean methods.
+ 
+ '''References:'''  [[https://issues.apache.org/jira/browse/PIG-1429|PIG-1429]]
+ 
+ '''Estimated Development Effort:'''  small
+ 
  ==== Fixed Point Type ====
  Pig currently supports the floating point types float and double.  These are not adequate for data where loss of precision is not acceptable, such as financial data.
  To address this issue Pig needs to add a fixed point type, similar to SQL's decimal type.  We hope that we can find an implementation of fixed type in existing
@@ -228, +240 @@

  '''References:'''  [[http://issues.apache.org/jira/browse/PIG-603|PIG-603]]
  
  '''Estimated Development Effort:'''  large
- 
- ==== Specifying the Value Type for Maps ====
- Currently maps require that their key be of type String, while allowing their values to be of any type.  In practice, Pig assigns a type bytearray to the value.
- If the value is actually another type (that is the loader or UDF that created the Map creates it as another type and not a !DataByteArray) the script writer is still
- required to cast the value to the type it already is so that Pig understands how to handle the data.  Given that users often store only one type of data in a map,
- it would be convenient for them to be able to specify a type for the value as well.  The contract would then be that all values in that map must be of the
- specified type.  By default maps would still leave the value unspecified.
- 
- '''Category:'''  New Functionality
- 
- '''Dependency:'''
- 
- '''References:'''
- 
- '''Estimated Development Effort:'''  small
  
  ==== Statistics for Optimizer ====
  Currently Pig's optimizer is entirely rule based.  We would like allow cost based optimization.  Some of this can be done with existing
@@ -260, +257 @@

  
  '''Estimated Development Effort:'''  medium
  
+ ==== Extend Load and Store Functions to be in Scripting Languages ====
+ In 0.8 we added the ability to write EvalFuncs and FilterFuncs in scripting languages.  We should extend this capability to load and store functions.
+ 
+ '''Category:'''  New Functionality
+ 
+ '''Dependency:'''
+ 
+ '''References:''' [[https://issues.apache.org/jira/browse/PIG-1777|PIG-1777]]
+ 
+ '''Estimated Development Effort:'''  small
+ 
+ ==== Extend UDFs in Scripting Languages to Allow Algebraic and Accumulator ====
+ In 0.8 we added the ability to write EvalFuncs and FilterFuncs in scripting languages.  However, these cannot use the Accumulator or Algebraic
+ interfaces, both of which can provide significant performance and scalability benefits.
+ 
+ '''Category:'''  New Functionality
+ 
+ '''Dependency:'''
+ 
+ '''References:''' [[https://issues.apache.org/jira/browse/PIG-1804|PIG-1804]]
+ 
+ '''Estimated Development Effort:'''  medium
+ 
+ ==== Add Ruby as a Supported Language for UDFs and Control Flow ====
+ This should use JRuby.
+ 
+ '''Category:'''  New Functionality
+ 
+ '''Dependency:'''
+ 
+ '''References:''' 
+ 
+ '''Estimated Development Effort:'''  medium
+ 
+ 
  === Agreed Work, Unknown Approach ===
  ==== Make Use of HBase ====
  Pig can do bulk reads and writes from HBase.  But it cannot use HBase in operators like a hash join.  We need operators that make use of HBase where it makes sense.  Also, we may need to provide support so that UDFs can efficiently access HBase themselves.
@@ -397, +429 @@

  
  '''Estimated Development Effort:'''  medium (involves rewrite of many physical operators)
  
+ ==== Shipping Dependencies for Scripting UDFs ====
+ Currently any dependencies for UDFs in scripting languages are not shipped along with the UDF to the backend.  The user has to assure that the required module(s) are
+ present on the backend already.  At the minimum Pig needs to provide a convenient way for users to declare those packages.  It would then ship those packages
+ to the backend and set up the environment so that the UDFs could fine them.  A next step past this would be to figure out which packages are needed either from
+ the scripts or the scripting engine and ship those to the backend.  The trick in this approach is recursing through the requirements so that any modules needed
+ by the explicitly included modules are also brought along.
+ 
+ '''Category:'''  Usability
+ 
+ '''Dependency:'''
+ 
+ '''References:''' [[https://issues.apache.org/jira/browse/PIG-1824|PIG-1824]]
+ 
+ '''Estimated Development Effort:'''  small to medium, depending on the approach chosen
+