You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "hc busy (JIRA)" <ji...@apache.org> on 2009/10/12 20:53:31 UTC

[jira] Created: (PIG-1016) Reading in map data seems broken

Reading in map data seems broken
--------------------------------

                 Key: PIG-1016
                 URL: https://issues.apache.org/jira/browse/PIG-1016
             Project: Pig
          Issue Type: Improvement
          Components: data
    Affects Versions: 0.4.0
            Reporter: hc busy


Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.

I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1016:
--------------------------------

    Fix Version/s:     (was: 0.7.0)

delaying since we need to clarify the approach

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766787#action_12766787 ] 

Hadoop QA commented on PIG-1016:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422303/PIG-1016.patch
  against trunk revision 826047.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/90/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/90/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/90/console

This message is automatically generated.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767385#action_12767385 ] 

Dmitriy V. Ryaboy commented on PIG-1016:
----------------------------------------

All tests started failing at the end of last week for all patches. Hopefully someone at Y! can sort out what's causing Hudson's nervous breakdown.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Status: Patch Available  (was: Open)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771442#action_12771442 ] 

Santhosh Srinivasan commented on PIG-1016:
------------------------------------------

Hc Busy, thanks for taking time to contribute the patch, explaining the details and especially for being patient. A few more questions and details have to be cleared up before we commit this patch.

IMHO, the right comparison should be along the lines of checking if o1 and o2 are NullableBytesWritable followed by a check for PigNullableWritable and then followed by error handling code.

Alan, can you comment on this approach?

There is a more important semantic issue. If the map value types are strings and if the strings are numeric, then the value types for the maps will be of different types. In that case, the load function will break. In addition, conversion routines might fail when the compareTo method is invoked. An example to illustrate this issue.

Suppose, the records is ['key'#1234567890124567]. PIG-880 would treat the value as a string and there would be no problem. Now, with the changes reverted, the type is inferred as integer and the parsing will fail as the value is too big to fit into an integer

Secondly, assuming that the integer was small enough to be converted, the comparison method in DataType.java will return the wrong results when an integer and a string are compared. For example, if the records are:

[key#*$]
[key#123]

The first value is treated as a string and the second value is treated as an integer. The compareTo method will return 1 to indicate that string > integer while in reality 123 > *$

Please correct me if the last statement is incorrect or let me know if it needs more explanation.

Thoughts/comments from other committers?

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766202#action_12766202 ] 

hc busy commented on PIG-1016:
------------------------------

I skimed PIG-880. Here is a simplified version of what I might need to do:


bash% cat map.dat 
[a#2,b#'d',c#(1,2,3)]
[a#1,b#'a',c#(1,2,3)]
[a#3,b#'c',c#(1,2,3)]
bash% PIG
grunt>A= load 'map.dat' as (data:map[]);
grunt>B= foreach A generate (int)(data#'a'), (chararray)(data#'b'),(tuple())(data#'c');
grunt>C= order B by $0;
grunt>dump C;
(1,'a',(1,2,3))
(2,'d',(1,2,3))
(3,'c',(1,2,3))
grunt>D= order B by $1;
grunt>dump D;
(1,'a',(1,2,3))
(3,'c',(1,2,3))
(2,'d',(1,2,3))

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771287#action_12771287 ] 

Santhosh Srinivasan commented on PIG-1016:
------------------------------------------

I am summarizing my understanding of the patch that has been submitted by hc busy.

Root cause: PIG-880 changed the value type of maps in PigStorage from native Java types to DataByteArray. As a result of this change, parsing of complex types as map values was disabled.

Proposed fix: Revert the changes made as part of PIG-880 to interpret map values as Java types. In addition, change the comparison method to check for the object type and call the appropriate compareTo method. The latter is required to workaround the fact that the front-end assigns the value type to be DataByteArray whereas the backend sees the actual type (Integer, Long, Tuple, DataBag, etc.)

Based on this understanding I have the following review comment(s).

Index: src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBytesRawComparator.java
===================================================================

Can you explain the checks in the if and the else? Specifically, NullableBytesWritable is a subclass of PigNullableWritable. As a result, in the if part, the check for both o1 and o2 not being PigNullableWritable is confusing as nbw1 and nbw2 are cast to NullableBytesWritable if o1 and o2 are not PigNullableWritable.  

{code}
+        // find bug is complaining about nulls. This check sequence will prevent nulls from being dereferenced.
+        if(o1!=null && o2!=null){
+    
+            // In case the objects are comparable
+            if((o1 instanceof NullableBytesWritable && o2 instanceof NullableBytesWritable)||
+               !(o1 instanceof PigNullableWritable && o2 instanceof PigNullableWritable)
+                ){
+    
+              NullableBytesWritable nbw1 = (NullableBytesWritable)o1;
+              NullableBytesWritable nbw2 = (NullableBytesWritable)o2;
+      
+              // If either are null, handle differently.
+              if (!nbw1.isNull() && !nbw2.isNull()) {
+                  rc = ((DataByteArray)nbw1.getValueAsPigType()).compareTo((DataByteArray)nbw2.getValueAsPigType());
+              } else {
+                  // For sorting purposes two nulls are equal.
+                  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
+                  else if (nbw1.isNull()) rc = -1;
+                  else rc = 1;
+              }
+            }else{
+              // enter here only if both o1 and o2 are non-NullableByteWritable PigNullableWritable's
+              PigNullableWritable nbw1 = (PigNullableWritable)o1;
+              PigNullableWritable nbw2 = (PigNullableWritable)o2;
+              // If either are null, handle differently.
+              if (!nbw1.isNull() && !nbw2.isNull()) {
+                  rc = nbw1.compareTo(nbw2);
+              } else {
+                  // For sorting purposes two nulls are equal.
+                  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
+                  else if (nbw1.isNull()) rc = -1;
+                  else rc = 1;
+              }
+            }
+        }else{
+          if(o1==null && o2==null){rc=0;}
+          else if(o1==null) {rc=-1;}
+          else{ rc=1; }
{code}

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771170#action_12771170 ] 

hc busy commented on PIG-1016:
------------------------------

Okay, trying to get this into a release of pig... I noticed 0.4 came , but nothing has happened on this ticket.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781727#action_12781727 ] 

Thejas M Nair commented on PIG-1016:
------------------------------------

I agree with hc busy that PigStorage in current state is broken.  
It does not support storing complex datatypes as map-values. But the problem with the approach before PIG-880 is that it has issues like what Santhosh mentioned -
bq. Suppose, the records is 'key'#1234567890124567. PIG-880 would treat the value as a string and there would be no problem. Now, with the changes reverted, the type is inferred as integer and the parsing will fail as the value is too big to fit into an integer

This problem arises because strings can have arbitrary values and can resemble other types.  This ambiguity in identifying types can be fixed if we require strings to be quoted in the file.  
I propose creating a new load/storeFunc -  PigStorage2  and require strings to be quoted in that, and apply the changes that hc busy proposed in this patch. This could be done in PIG-1083.
I am not sure if we should change PigStorage to pre PIG-880 .

comments ?


> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765695#action_12765695 ] 

Santhosh Srinivasan commented on PIG-1016:
------------------------------------------

The fix proposed in this JIRA reverts the changes made as part of PIG-880. Can you explain in more detail about the issue that you are facing currently? Specifically, can you provide a test case that reproduces this bug.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771200#action_12771200 ] 

Alan Gates commented on PIG-1016:
---------------------------------

I am keeping an eye on this ticket.  But at this point I'd like to get Santhosh's feedback on your changes before proceeding, as he had comments on your earlier patch and I want to make sure your new patch addresses them.  Santhosh, can you provide feedback soon, or let one of the other committers know what to look for so we can move forward on this?

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Allow map to take non-bytearray value types.

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Summary: Allow map to take non-bytearray value types.  (was: Reading in map data seems broken)

> Allow map to take non-bytearray value types.
> --------------------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Fix Version/s: 0.4.0
           Status: Open  (was: Patch Available)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.4.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1016:
--------------------------------

    Status: Patch Available  (was: Open)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764789#action_12764789 ] 

Hadoop QA commented on PIG-1016:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12421892/map_to_any_value.patch
  against trunk revision 824446.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/72/console

This message is automatically generated.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837643#action_12837643 ] 

Daniel Dai commented on PIG-1016:
---------------------------------

Hi, busy,
Finally I think I understand what you mean. You want to write a loader and in the loader, you want to put whatever to the map value, right? Then I think it is a valid use case. What I am talking about is if you use PigStorage to load data, map value is always bytearray.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment:     (was: PIG-1016.patch)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771543#action_12771543 ] 

Thejas M Nair commented on PIG-1016:
------------------------------------

A tuple can also be used instead of a typed map. 
This issue is specific to PigStorage load function, and it is present because it tries to auto-detect the map value type. I don't think we need to introduce a typed map in pig-latin for this. You can always create a new load function that returns typed maps. BinStorage() is an example of a Load/store function that stores the type information in data, and returns typed maps.
I think run-time identification of type is a bad idea, it results in surprising/unpredictable behavior.

In case of PigStorage(), I think it should always interpret the map-value as bytearray. In the pig-script , the user can cast the value to the expected type. PigStorage.bytesTo... functions would get used for this purpose. (I assume pig keeps track of the loader function that produced the data).
Map parsing will also be faster with this approach, compared to auto-detect of value type.


> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837574#action_12837574 ] 

Daniel Dai commented on PIG-1016:
---------------------------------

Hi, busy,
I checked your code, seems your patch assume PIG-1016.patch checked in. If I understand correctly, there are inconsistency in this approach. In your code, you allow map value to be any type. However, internally Pig always assume map value to be bytearray. So Pig will choose to use PigBytesRawComparator. And you further modify PigBytesRawComparator to handle all data type. This logic is very confusing. Further, TextDataParser itself if bogus since it will guess the data type based on the content. 

In PIG-613, we reiterate that map value is bytearray. However, we fixed the code which can cast bytearray to map/tuple/bag correctly. I verified the test case you gave, and it works.

{code}
A= load '9.txt' as (data:map[]);
B= foreach A generate (int)(data#'a'), (chararray)(data#'b'),(tuple(map[]))(data#'c');
C= order B by $0;
dump C;
{code}
Result:
(1,'a',(1,2,3))
(2,'d',(1,2,3))
(3,'c',(1,2,3))

{code}
D= order B by $1;
dump D;
{code}
Result:
(1,'a',(1,2,3))
(3,'c',(1,2,3))
(2,'d',(1,2,3))

{code}
describe B;
{code}
Result:
B: {int,chararray,(map[ ])}

Do you have other use cases which PIG-613 cannot address?

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767313#action_12767313 ] 

Hadoop QA commented on PIG-1016:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422436/PIG-1016.patch
  against trunk revision 826110.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/100/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/100/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/100/console

This message is automatically generated.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1016:
----------------------------

    Status: Open  (was: Patch Available)

Canceling the patch as Hudson was not able to successfully apply it.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Status: Patch Available  (was: Open)

% diff org/apache/pig/data/parser/TextDataParser.jjt org/apache/pig/data/parser/newTextDataParser.jjt
145c145
< 	String value = null;
---
> 	Object value = null;
149c149
< 	(key = StringDatum() "#" value = StringDatum())
---
> 	(key = StringDatum() "#" value = Datum())
151c151
< 		keyValues.put(key, new DataByteArray(value.getBytes("UTF-8")));
---
> 		keyValues.put(key, value);


> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771484#action_12771484 ] 

Hadoop QA commented on PIG-1016:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422575/PIG-1016.patch
  against trunk revision 830757.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/128/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/128/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/128/console

This message is automatically generated.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767382#action_12767382 ] 

hc busy commented on PIG-1016:
------------------------------

alright, I have no clue. Why are "all tests" failing, when each test execute correctly in my environment? What am I doing wrong?

Buildfile: build.xml

test:

ivy-download:
      [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.0.0-rc2/ivy-2.0.0-rc2.jar
      [get] To: /Users/h2/tmp/pig/build/trunk/ivy/ivy-2.0.0-rc2.jar
      [get] Not modified - so not downloaded

ivy-init-dirs:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:
[ivy:configure] :: Ivy 2.0.0-rc2 - 20081028224207 :: http://ant.apache.org/ivy/ ::
:: loading settings :: file = /Users/h2/tmp/pig/build/trunk/ivy/ivysettings.xml

ivy-compile:
[ivy:resolve] :: resolving dependencies :: org.apache.pig#Pig;0.6.0-dev
[ivy:resolve]   confs: [compile]
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve]   found jline#jline;0.9.94 in maven2
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve] :: resolution report :: resolve 167ms :: artifacts dl 4ms
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      compile     |   3   |   0   |   0   |   0   ||   3   |   0   |
        ---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve]  confs: [compile]
[ivy:retrieve]  0 artifacts copied, 3 already retrieved (0kB/5ms)
No ivy:settings found for the default reference 'ivy.instance'.  A default instance will be used
DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' instead
:: loading settings :: file = /Users/h2/tmp/pig/build/trunk/ivy/ivysettings.xml

init:

cc-compile:
     [move] Moving 1 file to /Users/h2/tmp/pig/build/trunk/build/ivy/lib/Pig

compile:
     [echo] *** Building Main Sources ***
     [echo] *** To compile with all warnings enabled, supply -Dall.warnings=1 on command line ***
     [echo] *** If all.warnings property is supplied, compile-sources-all-warnings target will be executed ***
     [echo] *** Else, compile-sources (which only warns about deprecations) target will be executed ***

compile-sources:

compile-sources-all-warnings:

ivy-test:
[ivy:resolve] :: resolving dependencies :: org.apache.pig#Pig;0.6.0-dev
[ivy:resolve]   confs: [test]
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve]   found jline#jline;0.9.94 in maven2
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve]   found junit#junit;4.5 in maven2
[ivy:resolve] :: resolution report :: resolve 122ms :: artifacts dl 4ms
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |       test       |   4   |   0   |   0   |   0   ||   4   |   0   |
        ---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve]  confs: [test]
[ivy:retrieve]  1 artifacts copied, 3 already retrieved (288kB/8ms)

compile-test:
     [echo] *** Building Test Sources ***
     [echo] *** To compile with all warnings enabled, supply -Dall.warnings=1 on command line ***
     [echo] *** If all.warnings property is supplied, compile-sources-all-warnings target will be executed ***
     [echo] *** Else, compile-sources (which only warns about deprecations) target will be executed ***

compile-sources:
    [javac] Compiling 1 source file to /Users/h2/tmp/pig/build/trunk/build/test/classes

compile-sources-all-warnings:

jar-withouthadoop:

jar-withouthadoopWithSvn:

ivy-download:
      [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.0.0-rc2/ivy-2.0.0-rc2.jar
      [get] To: /Users/h2/tmp/pig/build/trunk/ivy/ivy-2.0.0-rc2.jar
      [get] Not modified - so not downloaded

ivy-init-dirs:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: org.apache.pig#Pig;0.6.0-dev
[ivy:resolve]   confs: [buildJar]
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve]   found jline#jline;0.9.94 in maven2
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve]   found junit#junit;4.5 in maven2
[ivy:resolve] :: resolution report :: resolve 54ms :: artifacts dl 14ms
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |     buildJar     |   4   |   0   |   0   |   0   ||   4   |   0   |
        ---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve]  confs: [buildJar]
[ivy:retrieve]  0 artifacts copied, 4 already retrieved (0kB/10ms)

buildJar-withouthadoop:
     [echo] svnString 826722
      [jar] Building jar: /Users/h2/tmp/pig/build/trunk/build/pig-0.6.0-dev-core.jar
      [jar] Building jar: /Users/h2/tmp/pig/build/trunk/build/pig-0.6.0-dev-withouthadoop.jar
     [copy] Copying 1 file to /Users/h2/tmp/pig/build/trunk

jar-withouthadoopWithOutSvn:

test-core:
   [delete] Deleting directory /Users/h2/tmp/pig/build/trunk/build/test/logs
    [mkdir] Created dir: /Users/h2/tmp/pig/build/trunk/build/test/logs
    [junit] Running org.apache.pig.test.TestCompressedFiles
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 37.968 sec

test-contrib:

BUILD SUCCESSFUL
Total time: 56 seconds

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment: PIG-1016.patch

rename

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment: trunk_map_to_any_value.patch

Including a patch via svn diff.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: trunk_map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Allow map to take non-bytearray value types.

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838000#action_12838000 ] 

hc busy commented on PIG-1016:
------------------------------

Email chain:

{quote}
I agree the original title for PIG-613 is misleading, I change it to the right one. Yes, 613 does not solve 1016, which mean to put any data type into map. We shall continue work toward it.

--------------------------------------------------
From: "hc busy" <hc...@gmail.com>
Sent: Wednesday, February 24, 2010 12:15 PM
To: <pi...@hadoop.apache.org>; <pi...@hadoop.apache.org>
Subject: native types as value of map type? Re: Complex data types as value in a map function
- Hide quoted text -


well... I have this data:


[key#'1', b#'2', c#'3', key2#5]
[key#'2', b#'i', c#'m', key2#6]
[key#'3', b#'j', c#'n', key2#7]
[key#'4', b#'k', c#'o', key2#8]

and I run

A= load 'simple_map.data' as (m:map[]);
A2= FOREACH A generate (int)(m#'key2') as key, m;
dump A2

returning

(,[ key2#5, b#'2',key#'1', c#'3'])
(,[ key2#6, b#'i',key#'2', c#'m'])
(,[ key2#7, b#'j',key#'3', c#'n'])
(,[ key2#8, b#'k',key#'4', c#'o'])


I'm looking at PIG-613, but I guess the title is misleading. None of the
casting of value of map works in 0.5.0 I guess if PIG-613 works as
described, I would be in okay shape, because I would be able to cast again
and again using separate aliases...


PIG-613 not  what I meant for pig-1016, but it seems to get me the feature I
want.
{quote}

> Allow map to take non-bytearray value types.
> --------------------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Status: Open  (was: Patch Available)

Canceling patch due to comment

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment: PIG-1016.patch

Re-attaching patch. It seems my previous patch didn't pass _any_ unit tests. 

Ouch! Anyway, I ran a few unit tests, they still pass on my machine. I've been accused of having crap on my machine that make programs pass their unit tests.... Hopefully those accusations were false, and when the unit test passes on my machine, it passes on the build machines too.

4b425...904b2

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Allow map to take non-bytearray value types.

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1016:
----------------------------

         Assignee: Alan Gates
    Fix Version/s: 0.9.0

> Allow map to take non-bytearray value types.
> --------------------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>            Assignee: Alan Gates
>             Fix For: 0.9.0
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771308#action_12771308 ] 

hc busy commented on PIG-1016:
------------------------------

Well, I'd like to start by thanking everyone for the attention and support! As a first time contributor, I feel my heart warmed by the encouraging comments and serious time everyone is spending on my problem. I also greatly appreciate the patience everybody has, and of course I am perpetually grateful for everybody's work in making this all work.


Line by line, 
{code}
+        // find bug is complaining about nulls. This check sequence will prevent nulls from being dereferenced.
+        if(o1!=null && o2!=null){
...
+        }else{
+          if(o1==null && o2==null){rc=0;}
+          else if(o1==null) {rc=-1;}
+          else{ rc=1; }
{code}

Does what it says, it prevents a findbug warning. non-null is greater than null by convention.

{code}
+            // In case the objects are comparable
+            if((o1 instanceof NullableBytesWritable && o2 instanceof NullableBytesWritable)||
+               !(o1 instanceof PigNullableWritable && o2 instanceof PigNullableWritable)
+                ){
+    
+              NullableBytesWritable nbw1 = (NullableBytesWritable)o1;
+              NullableBytesWritable nbw2 = (NullableBytesWritable)o2;
+      
+              // If either are null, handle differently.
+              if (!nbw1.isNull() && !nbw2.isNull()) {
+                  rc = ((DataByteArray)nbw1.getValueAsPigType()).compareTo((DataByteArray)nbw2.getValueAsPigType());
+              } else {
+                  // For sorting purposes two nulls are equal.
+                  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
+                  else if (nbw1.isNull()) rc = -1;
+                  else rc = 1;
+              }
+            }
{code}


The if statement takes us outside of original comparison code (enclosed in outer if above) ONLY if both compratee are PigNullableWritable that are not NullableBytesWritable. This may seem confusing at first glance, but what it does is do the identical thing as before the patch except for the new case that I introduced by allowing other types.

The code is awkward, as Santhosh noted. But I am not too sure I understand the original implementation. But certainly, this way, we preserve original behavior and for new cases that this patch introduces, they are handled in the remaining else:

{code}
else{
+              // enter here only if both o1 and o2 are non-NullableByteWritable PigNullableWritable's
+              PigNullableWritable nbw1 = (PigNullableWritable)o1;
+              PigNullableWritable nbw2 = (PigNullableWritable)o2;
+              // If either are null, handle differently.
+              if (!nbw1.isNull() && !nbw2.isNull()) {
+                  rc = nbw1.compareTo(nbw2);
+              } else {
+                  // For sorting purposes two nulls are equal.
+                  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
+                  else if (nbw1.isNull()) rc = -1;
+                  else rc = 1;
+              }
+            }
{code}


This is the safest way I can think of writing this code, and I have been able to order by a value begotten out of a map. Also, join and then sort keyed on values of maps both works. 


I guess something that flows better might be the following:

{code}
        if(o1!=null && o2!=null){
     
            if((o1 instanceof PigNullableWritable && o2 instanceof PigNullableWritable ){
              PigNullableWritable nbw1 = (PigNullableWritable)o1;
              PigNullableWritable nbw2 = (PigNullableWritable)o2;
              // If either are null, handle differently.
              if (!nbw1.isNull() && !nbw2.isNull()) {
                  rc = nbw1.compareTo(nbw2);
              } else {
                  // For sorting purposes two nulls are equal.
                  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
                  else if (nbw1.isNull()) rc = -1;
                  else rc = 1;
              }
            }else{
              throw new Exception("bad compare");
            }
        }else{
          if(o1==null && o2==null){rc=0;}
          else if(o1==null) {rc=-1;}
          else{ rc=1; }
{code}

But I must admit that I don't know what the right thing to do is. I don't know the design well enough to know if throwing an exception is the appropriate thing? Or something else? And would the last code block perform the right comparison in place of the original function?


lmk of your thoughts on improvements to the patch.




> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764922#action_12764922 ] 

Hadoop QA commented on PIG-1016:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12421920/trunk_map_to_any_value.patch
  against trunk revision 824446.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/19/console

This message is automatically generated.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: trunk_map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment:     (was: PIG-1016.patch)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment:     (was: PIG-1016.patch)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Status: Patch Available  (was: Open)

I have put in a hack in the comparison method that PIG-880 was concerned about. For all data that are not part of a map value (Including errors, and non-matching classes), they will execute following the original code path.

For values that came from a map value, they will follow a separate execution path that performs comparison using builtin method called "compareTo()", which returns integer following programming conventions.

I've run the example I described in an earlier comment, as well as all unit tests. They all seem to work.




> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment:     (was: PIG-1016.patch)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773782#action_12773782 ] 

hc busy commented on PIG-1016:
------------------------------

Wait, wait, was that final? Why not have a FastPigStorage that parses maps into bytearray, and have PigStorage do the same as BinStorage returning nested objects?

How does the decision making work? Is there a vote or do I get one sponsor? or what?


thanks!!

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765415#action_12765415 ] 

Hadoop QA commented on PIG-1016:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422031/PIG-1016.patch
  against trunk revision 824980.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/25/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/25/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/25/console

This message is automatically generated.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766444#action_12766444 ] 

Daniel Dai commented on PIG-1016:
---------------------------------

I think the problem is in current TextDataParser, map is defined as String#String, and string exclude special characters such as "(", ")", ",", so busy has no way to generate a tuple in the value field of the map. The approach busy took looks valid to me.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837401#action_12837401 ] 

hc busy commented on PIG-1016:
------------------------------

No, it doesnt. (yet)

so, I was pulled away for a while, and didn't follow threads on pig. So, reading Thejas's comment, and discussing internally with my colleagues at work/ They agree with the discussion here that the existing PigStorage has it's merrits and that if I want nested data structure, I should write my own custom storage.

Basically, I have a separate storage that support nested data reading, and in my data, long values have been modified to include the 'l' on the end.

The change in PIG-1082 makes it possible for us to join and order on nested data structure. That change is still necessary even with my own loader because otherwise the data is still compared as DataByteArrays

{{src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBytesRawComparator.java}}
{code}
    public int compare(Object o1, Object o2) {
        NullableBytesWritable nbw1 = (NullableBytesWritable)o1;
        NullableBytesWritable nbw2 = (NullableBytesWritable)o2;
        int rc = 0;

        // If either are null, handle differently.
        if (!nbw1.isNull() && !nbw2.isNull()) {
            rc = ((DataByteArray)nbw1.getValueAsPigType()).compareTo((DataByteArray)nbw2.getValueAsPigType());
        } else {
            // For sorting purposes two nulls are equal.
            if (nbw1.isNull() && nbw2.isNull()) rc = 0;
            else if (nbw1.isNull()) rc = -1;
            else rc = 1;
        }
        if (!mAsc[0]) rc *= -1;
        return rc;
    }
{code}

to be changed to something like this:
{code}
    public int compare(Object o1, Object o2) {

        int rc=0;

        // find bug is complaining about nulls. This check sequence will prevent nulls from being dereferenced.
        if(o1!=null && o2!=null){

            // In case the objects are comparable
            if((o1 instanceof NullableBytesWritable && o2 instanceof NullableBytesWritable)||
               !(o1 instanceof PigNullableWritable && o2 instanceof PigNullableWritable)
                ){

              NullableBytesWritable nbw1 = (NullableBytesWritable)o1;
              NullableBytesWritable nbw2 = (NullableBytesWritable)o2;

              // If either are null, handle differently.
              if (!nbw1.isNull() && !nbw2.isNull()) {
                  rc = ((DataByteArray)nbw1.getValueAsPigType()).compareTo((DataByteArray)nbw2.getValueAsPigType());
              } else {
                  // For sorting purposes two nulls are equal.
                  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
                  else if (nbw1.isNull()) rc = -1;
                  else rc = 1;
              }
            }else{
              // enter here only if both o1 and o2 are non-NullableByteWritable PigNullableWritable's
              PigNullableWritable nbw1 = (PigNullableWritable)o1;
              PigNullableWritable nbw2 = (PigNullableWritable)o2;
              // If either are null, handle differently.
              if (!nbw1.isNull() && !nbw2.isNull()) {
                  rc = nbw1.compareTo(nbw2);
              } else {
                  // For sorting purposes two nulls are equal.
                  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
                  else if (nbw1.isNull()) rc = -1;
                  else rc = 1;
              }
            }
        }else{
          if(o1==null && o2==null){rc=0;}
          else if(o1==null) {rc=-1;}
          else{ rc=1; }
        }
        if (!mAsc[0]) rc *= -1;
        return rc;
    }
{code}


Because once we allow non-NullableBytesWritable's into the comparator, the comparator fails unless we handle those cases.


> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment:     (was: PIG-1016.patch)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766382#action_12766382 ] 

Santhosh Srinivasan commented on PIG-1016:
------------------------------------------

hc busy,

>From your example snippet, I was not able to understand if Pig is preventing you from doing that based on the current code base. If not, what is the error that you are seeing?

Santhosh

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Status: Open  (was: Patch Available)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837106#action_12837106 ] 

Daniel Dai commented on PIG-1016:
---------------------------------

This issue should be fixed as part of the effort in [PIG-613|https://issues.apache.org/jira/browse/PIG-613]. hc busy, can you check if that patch address your issue?

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment: map_to_any_value.patch

A patch for org/apache/pig/data/parser/TextDataParser.jjt

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment:     (was: map_to_any_value.patch)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment:     (was: PIG-1016.patch)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796893#action_12796893 ] 

hc busy commented on PIG-1016:
------------------------------

Hi Thejas, Olga, and rest, it sounds about right. I think PIG-1082 is ready from my previous effort, and PIG-1083 still needs to be done. And perhaps it will more sense to use avro or some other binary format instead.

I still have an ASCII nested datastructure to read in, but It's not very HP. Not sure if anybody needs it any more.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.7.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment: PIG-1016.patch

Submitting patch to work-around both PIG-880 and PIG-1016

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment: PIG-1016.patch

Same patch as before, but the hash seems different. maybe I submitted the wrong patch previously.

d337d3264bf5e6e925515ceff90718e10

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766710#action_12766710 ] 

hc busy commented on PIG-1016:
------------------------------

'kay, since my last comment, I've verified that in trunk, the patch in this ticket did not introduce an error. the Skewed join (correct or not) is returning the same number of rows when data read in is from a nested data structure as data read in from a tuple.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771485#action_12771485 ] 

hc busy commented on PIG-1016:
------------------------------

Yeah, I ran into that problem immediately. Had to reformat the data  to be

{{\['key'#983579482375984237957294L\]}}

(append 'L' at the end) The alternative...... is something funky like python's auto-promotion. If it overflows, automatically promote int to long, and then long to BigNumber...

And in the second case is a very very good example. So this is where that 'new Exception("Bad Compare");' will execute.

For my purpose, I  make sure the map has auto-detectable type and everything works fine.

If your map is known to have String value types, but some are numbers, you could cast it later.

result = foreach input generate (string)map#key

and compare that way.

Some possibilities:
Obviously if we can specify a schema inside the map 

A = load 'data' as (m:map['string_key':string, 'map_key':map[]];

where specified key types are read in as specified type, and then the rest are auto-detect.


Another thing, I guess I'm willing to use a special case of map... if say we made a auto-detecting map and a byte-array-valued map. then I'm in good shape too.

A = load 'data' as (m:map[]);
B = load 'data' as (m:tTypedMap[]);

But both seems like rather large efforts... (?)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment:     (was: PIG-1016.patch)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1016:
--------------------------------

    Status: Patch Available  (was: Open)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: trunk_map_to_any_value.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771571#action_12771571 ] 

hc busy commented on PIG-1016:
------------------------------

Thejas, great point! 

Run time detection of type does use more time at run time and require more discipline to use. 

But I'd like to point out that the original implementation seemed to have allowed for this in PigStorage. The change to reduce the types that can be stored in the value of a map seems to reduce functionality of Pig. 

I guess the one case where I want to use map is when I have a sparse tuple, that I don't want to type in a type for each of the many fields. Because if I went to that trouble, I'd just write java code, or use something where schema is statically defined and stored. 

say, for simple example, self join of one row 

{{\[data1#\[score#15l,unique_id#100\],data2#\[score#15,foreign#00100\]\]}} 

{code} 
B = join A by m#data1#unique_id, A by m#data2#foriegn 
C = Filter B by $0#score=$1#score 
{code} 

I'd think something like this should work without me typing in the entire type structure. 


Also, what happens when BinStorage returns a map with value that isn't a bytearray, does the comparison fail? 


> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Status: Open  (was: Patch Available)

Didn't pass a few other affected unit tests

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766634#action_12766634 ] 

hc busy commented on PIG-1016:
------------------------------

Thanks to everyone who is reviewing this ticket. I really appreciate it!

This feature is important because the data I have is slightly hierarchical (maps(string#map(:)) Some times I need to sort by values corresponding to one key in the map, while other times I need to merge on a value corresponding to a different key of the map.

Aside from the unit tests running, I also performed some join tests from this parser. The results are all fine except for the skew join, which produced twice as many rows as was right... has anybody else encountered this problem? Or is it only a result of taking values from a map?


thanks!

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774176#action_12774176 ] 

Dmitriy V. Ryaboy commented on PIG-1016:
----------------------------------------

HC Busy,
The decision making process should be explained somewhere on the apache process pages, but essentially -- there is a group of committers, who are final arbiters in what gets in and what doesn't. There's about 10 of them right now I think (Thejas, Alan, Daniel, and Santosh are all committers -- you are getting a fair bit of attention! :-)).  Patches go in when one of the committers gives a patch a "+1" vote, unless of course it's the committer's own patch -- then a different committer has to approve it.

The trouble with modifying PigStorage is that it's the default storage interpreter, so all changes have to be really thought through for it, and the preference is for doing the safe thing over the convenient thing.  So the bar is pretty high for that.  The change that reduced the functionality did so because this functionality was broken... so it won't be reversed until it's not broken.  

All that being said, I've found the committers to be perceptive to reasonable arguments. So you have two options -- change your patch to a TypedMapPigStorage, and put it in piggybank (that will probably get in quite fast), or continue working with the committers on finding a solution that works to their standards.

-D

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767387#action_12767387 ] 

hc busy commented on PIG-1016:
------------------------------

%#$%@#$, had me sweating for a while..., as mentioned previously, this is functionality that I'd like to use... not just fun weekend project... hehe..

thnx.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765302#action_12765302 ] 

Dmitriy V. Ryaboy commented on PIG-1016:
----------------------------------------

No worries, we are used to Jira sending us a never-ending stream of updates :-).
Looks good to me (assuming this passes Hudson).

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment: PIG-1016.patch

Unit test plus patch. This time unit test actually passes.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Status: Patch Available  (was: Open)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment: PIG-1016.patch

Sorry, first time contributor. This submit includes the fix and fixes several unit tests that failed

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment: PIG-1016.patch

This patch is generated with svndiff and has a unit test

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1016:
--------------------------------

    Fix Version/s:     (was: 0.5.0)
                   0.7.0

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.7.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837627#action_12837627 ] 

hc busy commented on PIG-1016:
------------------------------

As I mentioned, all I ask is for the functionality of the comparator to be fixed as specified in the patch attached to a subtask of this ticket (PIG-1082)

I understand that everyone is reiterating that value of a map is a bytearray, but why put this artificial restriction on the system? Especially since back in 0.3 when I filed this ticket, all pig documentation had said that value of map can be anything.  Why backtrack on something simple like this when there is no need to?

.... actually, looking at your example above, I'm not sure I understand this casting convetion that you've implemented

{code}
B= foreach A generate (int)(data#'a'), (chararray)(data#'b'),(tuple(map[]))(data#'c');
{code}

so, that last cast is a tuple of maps from 'c'? but int he result it's just a tuple of numbers
bq. (1,'a',(1,2,3))

I have no fundamental problem with being able to cast in a separate step and being able to order, join, and group by them, as long as all of those work. But say I have a 4-level-trie implemented as a nested map, in order to look something up, I would have to type out four separate aliases, re-casting 4 times, as opposed to being able to read the data in once (with custom readers) and resolve it on all on one line. 

Looking at the bigger picture, and considering some of the other areas of pig development, what I say is not an unlikely situation. And there are other people on the pig-user list asking for this feature.


But I do want to emphasize, I don't mean to diminish the importance of PIG-613 in anyway, I actually tried doing that when I filed this ticket. I just wasn't smart enough to figure that was also a problem...


I just think this is also a problem.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Fix Version/s:     (was: 0.4.0)
                   0.5.0
           Status: Patch Available  (was: Open)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>             Fix For: 0.5.0
>
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Status: Open  (was: Patch Available)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Attachment:     (was: trunk_map_to_any_value.patch)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1016) Reading in map data seems broken

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765183#action_12765183 ] 

Hadoop QA commented on PIG-1016:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12421949/PIG-1016.patch
  against trunk revision 824446.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/76/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/76/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/76/console

This message is automatically generated.

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1016) Reading in map data seems broken

Posted by "hc busy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hc busy updated PIG-1016:
-------------------------

    Status: Open  (was: Patch Available)

> Reading in map data seems broken
> --------------------------------
>
>                 Key: PIG-1016
>                 URL: https://issues.apache.org/jira/browse/PIG-1016
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.4.0
>            Reporter: hc busy
>         Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.