You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2010/10/19 21:05:21 UTC

[Pig Wiki] Update of "SemanticsCleanup" by AlanGates

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "SemanticsCleanup" page has been changed by AlanGates.
http://wiki.apache.org/pig/SemanticsCleanup?action=diff&rev1=3&rev2=4

--------------------------------------------------

  == Introduction ==
  A number of bugs have been filed against Pig that roughly fall under the area of poorly defined or undefined semantics.  In the 0.9 Pig release
  we would like to take on a number of these issues, clarifying semantics where they are unclear, defining them where they are undefined, and
- correctly them where they are clearly wrong.  This page will classifies the existing bugs and indicates what we believe the proper fix is for
+ correcting them where they are clearly wrong.  This page classifies the existing bugs and indicates what we believe the proper fix is for
  them.
  
  == Categories ==
@@ -14, +14 @@

   * Dynamic Type Binding:  In certain situations Pig assumes a value to be of type byte array when it does not know the actual type, and handles whatever actual type it is at runtime.  There are situations where this does not work properly.
  
  == Bug Table ==
+ || '''JIRA'''                                                  || '''Category'''       || '''Proposed Solution'''                                                                                                                                                            || '''Backward Compatible''' || '''Proposed Priority''' ||
+ || [[https://issues.apache.org/jira/browse/PIG-1341|PIG-1341]] || Dynamic type binding || Close as won't fix                                                                                                                                                                 || yes                       ||                   ||
- || '''JIRA''' || '''Category''' || '''Proposed Solution''' || '''Backward Compatible''' ||
- || [[https://issues.apache.org/jira/browse/PIG-1627|PIG-1627]] || Schema || Flattening a bag with an unknown schema should produce a record with an unknown schema || no ||
- || [[https://issues.apache.org/jira/browse/PIG-1584|PIG-1584]] || Grammar || Cogroup inner does not match the semantics of inner join.  It is also not clear what value the inner keyword has for cogroup. Consider removing it. || ||
- || [[https://issues.apache.org/jira/browse/PIG-1538|PIG-1538]] || Nested types || Remove two level access || Maybe, if we can find a way to ignore calls to Schema.isTwoLevelAccessRequired(). ||
- || [[https://issues.apache.org/jira/browse/PIG-1536|PIG-1536]] || Schema || Pick one semantic for schema merges and use it consistently throughout Pig || no ||
- || [[https://issues.apache.org/jira/browse/PIG-1371|PIG-1371]] || Nested types || unknown || ||
- || [[https://issues.apache.org/jira/browse/PIG-1341|PIG-1341]] || Dynamic type binding || Close as won't fix || yes ||
- || [[https://issues.apache.org/jira/browse/PIG-1281|PIG-1281]] || Dynamic type binding || In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-1281|PIG-1281]] || Dynamic type binding || In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it.  || yes                       ||                   ||
- || [[https://issues.apache.org/jira/browse/PIG-1277|PIG-1277]] || Nested types || Unknown || ||
- || [[https://issues.apache.org/jira/browse/PIG-1222|PIG-1222]] || Dynamic type binding || The issue here is that Pig thinks the field is a bytearray while BinStorage actually produces a String.  Need a way to handle these issues on the fly. || ||
+ || [[https://issues.apache.org/jira/browse/PIG-1222|PIG-1222]] || Dynamic type binding || The issue here is that Pig thinks the field is a bytearray while BinStorage actually produces a String.  Need a way to handle these issues on the fly.                             ||                           ||                   ||
- || [[https://issues.apache.org/jira/browse/PIG-1188|PIG-1188]] || Schema || Make sure Pig handles missing data in Tuples by returning a null rather than failing. || yes ||
- || [[https://issues.apache.org/jira/browse/PIG-1112|PIG-1112]] || Schema || When user provides AS to flatten of undefined bag or tuple, the contents of that AS are taken to be the schema of the bag or tuple. || yes ||
- || [[https://issues.apache.org/jira/browse/PIG-1065|PIG-1065]] || Dynamic type binding ||  In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-1065|PIG-1065]] || Dynamic type binding ||  In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes                       ||                   ||
- || [[https://issues.apache.org/jira/browse/PIG-999|PIG-999]] || Dynamic type binding ||  In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-999|PIG-999]]   || Dynamic type binding ||  In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes                       ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-696|PIG-696]]   || Dynamic type binding || Class cast exceptions such as this should result in a null value and a warning, not a failure.                                                                                     || yes                       ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-621|PIG-621]]   || Dynamic type binding || Class cast exceptions such as this should result in a null value and a warning, not a failure.                                                                                     || yes                       ||                   ||
- || [[https://issues.apache.org/jira/browse/PIG-847|PIG-847]] || Nested types || Remove two level access || maybe ||
- || [[https://issues.apache.org/jira/browse/PIG-828|PIG-828]] || Nested types || According to the rules of Pig Latin, this should produce a bag with one field.  Need to make sure that is what Pig is trying to do in this case. || yes ||
- || [[https://issues.apache.org/jira/browse/PIG-767|PIG-767]] || Nested types || Remove two level access; bring DUMP and DESCRIBE output into sync. || no ||
- || [[https://issues.apache.org/jira/browse/PIG-749|PIG-749]] || Schema || Related to PIG-1112 || yes ||
- || [[https://issues.apache.org/jira/browse/PIG-730|PIG-730]] || Nested types || Make sure schema of union is the same as schema before union (suspect his is a two level access issue) || unclear ||
- || [[https://issues.apache.org/jira/browse/PIG-723|PIG-723]] || Nested types || Suspect this is a two level access issue || unclear ||
- || [[https://issues.apache.org/jira/browse/PIG-696|PIG-696]] || Dynamic type binding || Class cast exceptions such as this should result in a null value and a warning, not a failure. || yes ||
- || [[https://issues.apache.org/jira/browse/PIG-694|PIG-694]] || Nested types || Determine the semantics for merging tuples and bags. || unclear ||
- || [[https://issues.apache.org/jira/browse/PIG-678|PIG-678]] || Grammar || Decide whether we want to support this extension. || yes ||
- || [[https://issues.apache.org/jira/browse/PIG-621|PIG-621]] || Dynamic type binding || Class cast exceptions such as this should result in a null value and a warning, not a failure. || yes ||
- || [[https://issues.apache.org/jira/browse/PIG-435|PIG-435]] || Schema || Decide definitely on what it means when users declare a schema for a load. || unclear ||
- || [[https://issues.apache.org/jira/browse/PIG-333|PIG-333]] || Dynamic type binding || Since it is specified that MIN and MAX treat unknown types as double, all the actual string data should be converted to NULLs, rather than cause errors. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-333|PIG-333]]   || Dynamic type binding || Since it is specified that MIN and MAX treat unknown types as double, all the actual string data should be converted to NULLs, rather than cause errors.                           || yes                       ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-1584|PIG-1584]] || Grammar              || Cogroup inner does not match the semantics of inner join.  It is also not clear what value the inner keyword has for cogroup. Consider removing it.                                ||                           ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-678|PIG-678]]   || Grammar              || Decide whether we want to support this extension.                                                                                                                                  || yes                       ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-438|PIG-438]]   || Grammar              || Support reassigning of aliases.                                                                                                                                                    || yes                       || low               ||
- || [[https://issues.apache.org/jira/browse/PIG-313|PIG-313]] || Grammar || I propose that we continue not supporting this.  But we should detect it at compile time rather than at runtime. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-313|PIG-313]]   || Grammar              || I propose that we continue not supporting this.  But we should detect it at compile time rather than at runtime.                                                                   || yes                       ||                   ||
- 
- Bugs I need to add this but haven't gotten to yet:
- 
- 438, 453, 496, 516, 666, 667
+ || [[https://issues.apache.org/jira/browse/PIG-1538|PIG-1538]] || Nested types         || Remove two level access                                                                                                                                                            || Maybe, if we can find a way to ignore calls to `Schema.isTwoLevelAccessRequired()`. ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-1371|PIG-1371]] || Nested types         || unknown                                                                                                                                                                            ||                           ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-1277|PIG-1277]] || Nested types         || Unknown                                                                                                                                                                            ||                           ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-847|PIG-847]]   || Nested types         || Remove two level access                                                                                                                                                            || maybe                     ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-828|PIG-828]]   || Nested types         || According to the rules of Pig Latin, this should produce a bag with one field.  Need to make sure that is what Pig is trying to do in this case.                                   || yes                       ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-767|PIG-767]]   || Nested types         || Remove two level access; bring DUMP and DESCRIBE output into sync.                                                                                                                 || no                        ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-730|PIG-730]]   || Nested types         || Make sure schema of union is the same as schema before union (suspect his is a two level access issue)                                                                             || unclear                   ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-723|PIG-723]]   || Nested types         || Suspect this is a two level access issue                                                                                                                                           || unclear                   ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-694|PIG-694]]   || Nested types         || Determine the semantics for merging tuples and bags.                                                                                                                               || unclear                   ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-496|PIG-496]]   || Nested types         || Support use of positional references in bags and tuples when the bag is declared as `bag{}` or the tuple as `tuple()`                                                              || yes                       ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-1627|PIG-1627]] || Schema               || Flattening a bag with an unknown schema should produce a record with an unknown schema                                                                                             || no                        ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-1536|PIG-1536]] || Schema               || Pick one semantic for schema merges and use it consistently throughout Pig                                                                                                         || no                        ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-1188|PIG-1188]] || Schema               || Make sure Pig handles missing data in Tuples by returning a null rather than failing.                                                                                              || yes                       ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-1112|PIG-1112]] || Schema               || When user provides AS to flatten of undefined bag or tuple, the contents of that AS are taken to be the schema of the bag or tuple.                                                || yes                       ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-749|PIG-749]]   || Schema               || Related to PIG-1112                                                                                                                                                                || yes                       ||                   ||
+ || [[https://issues.apache.org/jira/browse/PIG-435|PIG-435]]   || Schema               || Decide definitely on what it means when users declare a schema for a load.                                                                                                         || unclear                   ||                   ||
  
  
  == Discussion ==