You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2012/10/15 23:01:04 UTC

[jira] [Created] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Koji Noguchi created PIG-2975:
---------------------------------

             Summary: TestTypedMap.testOrderBy failing with incorrect result 
                 Key: PIG-2975
                 URL: https://issues.apache.org/jira/browse/PIG-2975
             Project: Pig
          Issue Type: Sub-task
    Affects Versions: 0.11
            Reporter: Koji Noguchi
            Priority: Blocker


Looked at 
{noformat}
junit.framework.AssertionFailedError
    at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
{noformat}

This looks like a valid test case failing with incorrect result.

{noformat}
% cat test/orderby.txt
[key#1,key9#23]
[key#3,key3#2]
[key#22]

% cat test/orderby.pig
a = load 'test/orderby.txt' as (m:[]);
b = foreach a generate m#'key' as b0;
dump b;
c = order b by b0;
dump c;

% java ... org.apache.pig.Main    -x local test/orderby.pig 
[dump b]
(1)
(3)
(22)

...
[dump c]
(1)
(1)
(22)
%

where did the '(3)' go?
{noformat}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482057#comment-13482057 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Thanks for the great job, Koji. It's in!
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480153#comment-13480153 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq.  I am going to roll up my sleeves and do this right.

Jonathan, I'm using these opportunities to learn pig.  I'd appreciate if you could lead me to the right direction although I fully understand that it'll be much faster if you fix it yourself.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480381#comment-13480381 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-----------------------------------------------------

Personally I would go with BinInterSedesRawComparator and try to see where it spends time to make it faster. From a software engineering standpoint, that class is the one responsible for raw comparisons of tuples.
Once you factor improvements in that class, all the other places that use it get the improvements for free.
Even if we go for a special case, I would special-case inside BinInterSedesRawComparator rather than outside it.
The fewer the places that make assumptions about the serialization format, the easier we can change it.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477399#comment-13477399 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq.  (For larger bytearray whose size does not fit in 1 byte(TINYBYTEARRAY), this compare method would take part of the header as content hurting the order but not the uniqueness.)

This can be confirmed without PIG-2862.  Using pig-0.10, we can see that bytearray-order is messed up.

{noformat}
$ cat orderbig.txt
1
3
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
2
55
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
4
33

$ cat orderbig.pig
a = load 'orderbig.txt' as (num:bytearray);
b = order a by num;
dump b;

$ pig -x local orderbig.pig 
...
(222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222)
(1)
(2)
(2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222)
(3)
(33)
(4)
(55)
{noformat}
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479421#comment-13479421 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Koji,

I don't think we need to sacrifice performance if we use BinInterSedes.BinInterSedesRawComparator. It traverses the bytes, it doesn't deserialize or make any objects (and I think I found an improvement we can make).

As far as sort order, I think it's meant to be somewhat odd on purporse.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480296#comment-13480296 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

Forgot to mention one more alternative.  Take out the 4 bytes offset and just compare everything including the meta/header. This would work but it'll mess up the byte order completely.  Until now, at least bytearray less than 256 bytes has been alphabetically ordered.

Also, tested my last patch and it gave about 7 minutes 30 seconds. I was expecting closer to the trunk but something is slowing it down a bit.

So, here's my understanding.

||patch                   ||correctness||byte order||speed||space||
|current                   | X          | OX        | O    | O    |
|BinInterSedesRawComparator| O          | O         | X    | O    |
|union approach            | O          | O         | OX   | X    |
|no 4 bytes offset compare | O          | X         | O    | O    | 

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477547#comment-13477547 ] 

Cheolsoo Park commented on PIG-2975:
------------------------------------

Here is the full list of tests that will be fixed by this:

*unit test:*
TestEvalPipeline2
TestEvalPipelineLocal
TestLimitVariable
TestPruneColumn
TestTypedMap

*e2e test:*
Order_1
Order_2
Order_3
Order_6
Order_7
Order_18
Split_6
BigData_7
MergeJoin_1,2,3,4,6,7,8
MergeSparseJoin_1,2,3,5


                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481548#comment-13481548 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Koji,

I am digging this!

A couple stylistic points:
- l1 is a really hard to read. Please use fuller names (even len1 and len2)
- IMHO, spaces make = and + etc more readable (ie databytearraycompare=false)
- on that front, use camelCase for multi-word lines

Last point: I would love to have a unit test that tests this specifically. if you could do that, this would be truly pro-style. Otherwise it looks great
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479404#comment-13479404 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

Silly me. Result of above script was 
{noformat}
(0)
(0)
(0)
(0)
(a,b)
(b,c)
(d,e)
{noformat}
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478213#comment-13478213 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Quick first answer to your questions:
1. Hrm, this is a tricky one. I think the answer is yes (and if it is yes, we should change the constructor to expect a DataByteArray). The issue is this: in Pig, you can have an unknown type (ie it has a name, but not type), but under the hood is it an int or whatever. I'm wondering if there is any case when Pig would do the cast, without making this knowledge available to the work flow. I THINK the answer is no, so I would follow this course, and if it causes errors, then we'll refactor to fix that.
2. I am looking at the ByteWritable code here: http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/io/BytesWritable.java it looks like it just shuffles around the pointer to the same bytes, so it should be ok.


                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480374#comment-13480374 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Gianmarco,

That's what I'm currently pondering as well. Is there any reason why DataByteArray sorting needs to be stable across releases? I believe the docs say that there are no guarantees, which means it is subject to change at any time. If people want defined semantics in this respect, they can just cast it to the appropriate data type., no?
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-2975:
------------------------------

    Attachment: pig-2975-trunk_v02-broken.txt

(This patch is broken. Not intended for commit)

bq. Another option, of course, is changing the raw Comparator. We could instead use PigTupleDefaultRawComparator once we unwrap a layer.

Tried to shortcut by extending NullableTuple. This patch does fix the two issues but introduce an extra layer of Tuple for outputs.

{noformat}
$ cat test/order.txt
3
222
1
222222222222...222 (repeat more than 256 time)
53
$ cat test/order.pig
a = load 'test/order.txt' as (a1:bytearray);
b = order a by a1;
dump b;

$ pig -x local test/order.pig 
((1))
((2))
((222))
((222222222222...222))
((3))
((4))
((53))
((99))
{noformat}

I want getValueAsPigType to return Tuple for sorting comparisons but actual value within Tuple for others... Still thinking. 
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480266#comment-13480266 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

I agree with everything Gianmarco said.

As far as the switch approach, I don't think it is lame. It's a byproduct of the way that Pig handle's types and serializes them.

If you look at BytesWritable.Comparator, literally all it does is call compareBytes. There's no reason we can't do that. You can do a switch based on the second byte in the bytearray, and then call compareBytes with the right offset.

I'll take a look at your patch after lunch.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-2975:
------------------------------

    Attachment: pig-2975-trunk_v03-unionapproach.txt

bq. 2. We could just make a custom WritableComparator for this case

My impression is that using the BytesWritable compare directly would be the fastest.


bq. (right now it is going to be TUPLE_1 / {TINYBYTEARRAY, SMALLBYTEARRAY, BYTEARRAY} / SIZE/ and so on. 

If the header size is different, I would need a switch somewhere. So thought of this lame approach.

{noformat}
/*
 * This class tries to optimize for the most common input, DataByteArray
 * In order to preserve the alphabetical ordering for DataByteArray,
 * we skip the first 4 bytes when comparing.
 * For non-DataByteArray, empty 4bytes is added so that content is not
 * skipped by the above offset.  Order for non-DataByteArray would look
 * random since it includes all the headers for comparisons.
 *
 * Bytes comparison is done by pair (isByteArray, mValue) to avoid any
 * potential collision among DataByteArray and non-DataByteArray.
 * //Serialization structure
 * struct {
 *   byte mNull;
 *   int size; (empty for non-DataByteArray)
 *   byte isByteArray;
 *   union {
 *    byte [size];      //for DataType.BYTEARRAY
 *    Tuple.serialized  //for all others
 *   } mValue;
 *   byte mIndex;
 * }
 *
 */
{noformat}

This sacrifices the space for performance.
* For DataType.BYTEARRAY, it adds 2 more bytes for small record (<256).
size(4bytes) + 1byte(isByteArray) = 5bytes
Before, it was TUPLE_1(1byte) + TINYBYTEARRAY(1byte) + size(1byte) = 3bytes

* For non-BYTEARRAY, 5 bytes. empty 4 bytes + 1byte boolean. This is in addition to whatever Tuple adds when serialized.

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-2975:
----------------------------------

    Attachment: PIG-2975-0_jco.patch

I took a stab at fixing this myself. As I suspected, you can use BinSedesTuple.BinInterSedesTupleRawComparator and it works fine. TestTypedMap passes, though I need to check with other cases.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477394#comment-13477394 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

This took me much longer to understand than I first anticipated.

This is not necessary a regression bug in PIG-2862, but PIG-2862 helped
manifest a subtle bug in pig that we always had for a long time.

In short, pig has a bug of using PigBytesRawComparator for
NullableBytesWritable key even though its content is stored as a Tuple.

To explain in detail, following is the serialized output format for
PigNullableWritable.java
{noformat}
|-------------|
| mNull       |  1byte
|-------------|
|             |
| mValue      |  WritableComparable __ bytes
|             |
|             |
|-------------|
| mIndex      |  1byte
|-------------|
{noformat}
When comparing through PigBytesRawComparator.compare(), we skip the above mNull
and mIndex so we can ignore that part.


Now, looking at what's stored inside mValue for
NullableBytesWritable (extends PigNullableWritable).
It holds one DataBytesArray entry inside a Tuple.
{noformat}
|-------------|
| mNull       |  1byte
|-------------|  ------------------
| TINYTUPLE   |  1byte            |
|-------------|                   |
|  sz=1       |  1byte            |
|-------------|                  mValue
|             |                   |
| DataByteArr |                   |
|             |                   |
|             |                   |
|-------------| -------------------
| mIndex      |  1byte
|-------------|
{noformat}

And expanding how DataByteArray is serialized for the smallest content, 1
byte.
{noformat}
|-------------|
| mNull       |  1byte
|-------------|  ------------------
| TINYTUPLE   |  1byte            |
|-------------|                   |
|  sz=1       |  1byte            |
|-------------|                  mValue(5bytes)
|TINYBYTEARRAY|  1byte            |
|-------------|                   |
|  sz=1       |  1byte            |
|-------------|                   |
|content 1byte|  1byte            |
|-------------| -------------------
| mIndex      |  1byte
|-------------|
{noformat}

As I mentioned in the beginning,
even though above PigNullableWritable is storing mValue as *Tuple*, we are using
PigBytesRawComparator(mWrappedComp = BytesWritable.Comparator) 
which actually *skips 4 bytes* when comparing the two writables.

{noformat}
BytesWritable.java
 37   private static final int LENGTH_BYTES = 4;
...
205     public int compare(byte[] b1, int s1, int l1,
206                        byte[] b2, int s2, int l2) {
207       return compareBytes(b1, s1+LENGTH_BYTES, l1-LENGTH_BYTES,
208                           b2, s2+LENGTH_BYTES, l2-LENGTH_BYTES);
209     }
{noformat}

So, even though there is a mismatch between having Tuple in NullableBytesWritable
and comparing them as ByteArrays, coincidentally it is not skipping the actual tuple
content.  (For larger bytearray whose size does not fit in 1 byte(TINYBYTEARRAY), this compare method would take part of the header as content hurting the order but not the uniqueness.)

Now, with this new feature of PIG-2862 (TUPLE BinInterSedes byte identifier),
this delicate balance suddenly broke, when the minimum header size became 3 from 4.
(TINYTUPLE + sz = 2bytes now became TUPLE_1 = 1byte)
{noformat}
|-------------|
| mNull       |  1byte
|-------------|  ------------------
| TUPLE_1     |  1byte            |
|-------------|                  mValue(4bytes)
|TINYBYTEARRAY|  1byte            |
|-------------|                   |
|  sz=1       |  1byte            |
|-------------|                   |
| tuple 1 byte|  1byte            |
|-------------| -------------------
| mIndex      |  1byte
|-------------|
{noformat}

So BytesWritable.Comparator started not only skipping the header but the
content itself.  On the original example, byte '1' and '3' became 0 byte comparisons and was treated as equal bytes leading to the weird output.
I confirmed this by inserting a debug statement inside
BytesWritable.Comparator.compare() and seeing 0 length byte array comparisons.

I believe the fix would be to store mValue as a BytesArray instead of a Tuple for NullableBytesWritable.java.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-2975:
------------------------------

    Attachment: pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt

bq. or we can just have a special lightweight comparator that special cases DataByteArrays, and delegates to BinInterSedesRawComparator otherwise.

This one was faster than I expected.
414 seconds average vs the simple raw compare(including the header) of 398 seconds.
(Much faster than my bulky union approach of 436 seconds.)

I also tried moving this special case comparator to inside BinInterSedesRawComparator.compare, but that jumped the runtime back to over 600 seconds.

It's just one extra hop(method) + one extra checking(Tuple_1) but somehow jvm couldn't handle it well.

Adding test cases now.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480146#comment-13480146 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Oof, thanks for benchmarking. I am going to roll up my sleeves and do this right. Shouldn't be too hard, tbh.

That said, I will also try making that part of the comparator faster.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478208#comment-13478208 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. Can I assume the param to NullableBytesWritable constructor is always DataByteArray ?

This was a wrong assumption.  One test was failing with

{noformat}
Unable to open iterator for alias d. Backend error : Unable to recreate
exception from backed error: java.lang.ClassCastException: java.lang.Integer
cannot be cast to org.apache.pig.data.DataByteArray  at
org.apache.pig.impl.io.NullableBytesWritable.<init>(NullableBytesWritable.java:36)
{noformat}

I'll try to come up with another patch.

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480413#comment-13480413 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

{quote}
pig-2975-trunk_v04-purerawcompare.txt
2222222222...222222222222222) (repeat 256 times)
(1)
(2)
(12)
(22)
(32)
(222)
(555)
{quote}
One thing I worry about this new sort order is, we might get a worse compression ratio for users' output. (Similar record ordered far away.)
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478227#comment-13478227 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Another option, of course, is changing the raw Comparator. We could instead use PigTupleDefaultRawComparator once we unwrap a layer.

The issue is that it does indeed look like you can have something that is actually an int, but is labeled as a bytearray by Pig. This is why they wrapped it as a Tuple. It'd be good to add that comment as well.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479402#comment-13479402 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. Result incorrect (when order-by used). [0.11 and trunk]

Reading the code, I was able to come up with incorrect result case in 0.10. 
It's probably rare since the type has to be unknown. Any Datatype that has less than 2 bytes of header size in BinInterSedes.java can hit this issue.

{noformat}
$ pig -version
USING: /grid/0/gs/pig/current
Apache Pig version 0.10.1.0.1206081058 (r1348169) 

$ cat pig-2975-mixed.pig
a = load 'pig-2975-mixed1.txt' as (a0:chararray, a1:chararray);
b = load 'pig-2975-mixed2.txt' as (b0:int);
y = union a,b;
z = order y by $0;
dump z;
$ cat pig-2975-mixed1.txt
a       b
b       c
d       e
$ cat pig-2975-mixed2.txt
0
1
0
1
{noformat}
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480382#comment-13480382 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. but I feel that it would be good to keep it stable across releases, if possible.

Ah, that makes sense.  But isn't this part of the code only hit when the schema is 'unknown' or is a defined 'bytearray'? 
If so, can we say that ordering of 'unknown' type is unknown and could change from version to version? :)

I'm still missing the bigger picture of how often NullableBytesWritable is called with non-ByteArray data.  My assumption was that this is rare and I optimized for ByteArray in my union patch.

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479523#comment-13479523 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

FWIW I think my patch fixes this, and I don't think it has any downsides. It just uses the normal RawComparator.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-2975:
------------------------------

    Attachment: pig-2975-trunk_v01.txt

bq. I believe the fix would be to store mValue as a BytesArray instead of a Tuple for NullableBytesWritable.java.
Attaching  my idea. I'm only looking at this particular problem without understanding how the overall pig works.  Let me know if this breaks anything.  

Two worries I have so far.
* Can I assume the param to NullableBytesWritable constructor is always DataByteArray ?
* DataByteArray -> NullableBytesWritable -> DataByteArray could cause extra copying that we may not had before.  Added size checking to reduce the number of copies.

This patch does fix the TestTypedMap.testOrderBy output corruption failure we were seeing.  Also, it fixes the bytearray order bug I mentioned in the previous comment.  I'll attach another patch later with the test.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480301#comment-13480301 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. As far as the switch approach, I don't think it is lame.
Just to clarify, my comment
"So thought of this lame approach." was pointing to my union-patch which is overly complicated and wasted space.  
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gianmarco De Francisci Morales updated PIG-2975:
------------------------------------------------

    Assignee: Koji Noguchi  (was: Gianmarco De Francisci Morales)
    
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481790#comment-13481790 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

I'm going to give it one last look-over and make sure that test-commit passes, otherwise I'll commit it shortly.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480103#comment-13480103 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. I don't think we need to sacrifice performance if we use BinInterSedes.BinInterSedesRawComparator.

I tried sorting uncompressed texts with and without your patch and compared the map time of one of the sort phase job. (1mapper 1 reducer)

Without the patch
1st 6mins, 24sec
2nd 6mins, 21sec

With the patch
1st 12mins, 41sec
2nd 12mins, 40sec

So there is a performance hit. 

If you look at BinInterSedes.BinInterSedesRawComparator, it'll eventually comes down to 
{noformat}
 759         private int compareBinInterSedesDatum(ByteBuffer bb1, ByteBuffer bb2, boolean[] asc) throws IOException {
 857             case BinInterSedes.TINYBYTEARRAY:
 858             case BinInterSedes.SMALLBYTEARRAY:
 859             case BinInterSedes.BYTEARRAY: {
 860                 type1 = DataType.BYTEARRAY;
 861                 type2 = getGeneralizedDataType(dt2);
 862                 if (type1 == type2) {
 863                     int basz1 = readSize(bb1, dt1);
 864                     int basz2 = readSize(bb2, dt2);
 865                     byte[] ba1 = new byte[basz1];
 866                     byte[] ba2 = new byte[basz2];
 867                     bb1.get(ba1);
 868                     bb2.get(ba2);
 869                     rc = DataByteArray.compare(ba1, ba2);
 870                 }
 871                 break;
{noformat}

Probably taking out this extra copies for bytearray comparisons would improve the time.  Trying.   

(Separately, I'm trying out having a union-in-c like approach for DataByteArray and Tuple.)

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481770#comment-13481770 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-----------------------------------------------------

Guys, great job in moving this forward!
I am sold an all the improvements in the patch.
+1
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480190#comment-13480190 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Koji,

That is super reasonable. I hate bugs like this, so let's kill it :) I'm responsible for trying to usher in a new pig-0.11 release, both internally and externally, which is why I'm so gung ho about it.

Here is what I would say:

0. It'd be nice to have some tests focused on just this.

1. I was thinking that since compareBinInterSedesDatum has a handle on the ByteBuffer, instead of reading in the full byte[], we could just do the comparison via calls to .get(). ByteBuffered is buffered so I think that in the general case, this will be a win (but I could be wrong -- it'd be quick to implement and see). I put this before #2 because if we can bring the times in, it'd be nice to leverage the same code path.

2. We could just make a custom WritableComparator for this case. It would not be hard at all. We know the byte layout of how NullableBytesWritable is implemented, so we can just leverage that directly (right now it is going to be TUPLE_1 / {TINYBYTEARRAY, SMALLBYTEARRAY, BYTEARRAY} / SIZE/ and so on. Hadoop's BytesWritable is actually a key resource, we just need to tailor it to pig. It can just be switch based, and if it is an object other than a bytearray, we can default to another comparator. If you want it to be fast in all cases, you could copy the switch that BinInterSedesRawComparator uses, and go from there. I put this after #1 though because it seems lame to pull out all that logic, since in BignInterSedesRawComparator we are in fact making the decision to wrap it in a ByteBuffer, so if that is introducing a severe speed penalty, we need to be aware of that.


And then go from there. Seem reasonable?
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gianmarco De Francisci Morales reassigned PIG-2975:
---------------------------------------------------

    Assignee: Gianmarco De Francisci Morales  (was: Koji Noguchi)
    
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Gianmarco De Francisci Morales
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480455#comment-13480455 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

This is one benefit (and in some senses, the drawback) of using BinInterSedesRawComparator. Because of how Tuples are serialized, it in fact is using the "proper" raw comparator (and thus, providing the proper sort order) even though the user did not specify a Schema.

I found Gianmarco's argument towards trying to make BinInterSedesRawComparator fairly persuasive, though that code has a different goal.

I guess this comes down to how nice we want to be to people given that they do not specify a Schema. We can take a performance hit and try and figure things out for them, or we can make it blazing fast but with arbitrary guarantees.

Given that the way to free yourself from those arbitrary guarantees is "add a schema," you would then lose the speed benefits anyway. This, to me, is an argument for using BinInterSedesTupleRawComparator, in the sense that if this is the "preferred" path, we should use it and, as Gianmarco said, spend time optimizing it (since it is a pretty important code path for a lot more code than just this case). UNLESS we want to promote using DataByteArray's explicitly because we can do a much faster sort (I do not think this is what we should advocate, though if something is legitimately a DataByteArray there is no reason not to try and optimize that path so it's very fast...it should be, eh?).

Thoughts?

Thanks for hashing this out, guys.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478194#comment-13478194 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Great work, Koji. A test specifically aimed at this would be delightful. In the meantime, I'm going to poke around and make sure that it doesn't break anything (and look into the extra copying issue).

Very good find.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476447#comment-13476447 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

I haven't had time to look into detail but reverting PIG-2862 seems to fix the issue.
(Test started failing only after PIG-2936 when BinInterSedes byte-identifier started to get picked up by the test.)
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-2975:
------------------------------

    Attachment: pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt

Attaching patch with Jonathan's suggested changes (except the first one).

Thanks Jonathan for all your help and being patient with me!

bq. l1 is a really hard to read. Please use fuller names (even len1 and len2)

Agree. But this is coming from the original code. l1,s1,l2,s2 seem to be used everywhere for compare() method unfortunately.  Leaving them for now.


bq.  IMHO, spaces make = and + etc more readable (ie databytearraycompare=false)

Added.

bq. on that front, use camelCase for multi-word lines

Changed.

Also added couple of test cases for incorrect results and one for Alphabetical sorting of bytearrays across Tiny/Small/Regular size boundaries. 

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480373#comment-13480373 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-----------------------------------------------------

Yes, I was referring to the last alternative.

If the serialization format changes (say we redefine the codes for TINYTUPLE) then we end up with a new order for tuples.
As I said ByteArray sorting does not have a definite semantic, but I feel that it would be good to keep it stable across releases, if possible.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482064#comment-13482064 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

'twas a delight. On to the next one ;)
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478220#comment-13478220 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

One possibility, Koji, is that in the constructor we could special case byte arrays, and do the new code if it is and the old one if it isn't.

One thing I need to look at is why the other code breaks. Why does it jump a static 4 bytes? Is anything else broken?
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi reassigned PIG-2975:
---------------------------------

    Assignee: Koji Noguchi
    
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482063#comment-13482063 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

Thanks Jonathan, Gianmarco and Cheolsoo!  
(and sorry for my loooong detours :) 
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480231#comment-13480231 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Hmm, ok, well, that's a good change to keep in in general.

I think that an ok short term solution is to special case DataByteArray's with a custom WritableComparator that, in the case of a BYTEARRAY/TINYBYTEARRAY/SMALLBYTEARRAY will just use WritableComparator's compareBytes a la BytesWritable, else it fails over to BinInterSedesRawComparator.

Let's make some tests and fix this. Correctness trumps performance, though let's make sure that this failover approach is performant (I see no reason it shouldn't be).

But then step 2 is to make a separate ticket about optimizing BinInterSedesRawComparator. Anyone working on Pig can get a key for yourkit, so you can ping me for that. The pro-style approach IMHO is to use Google Caliper to build some micro-benchmarks (caliper is good about warming up the JVM), while also using the bigger benchmark you've been using in this thread. Then you can use YourKit while isolating the difference in speeds and isolate where the difference is coming in, and what method calls are taking the most time.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482072#comment-13482072 ] 

Cheolsoo Park commented on PIG-2975:
------------------------------------

Thank you Koji!

Now I am running all the tests with your fix and going to close jiras as soon as I verify that they are fixed.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480378#comment-13480378 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

As a side note, Koji, if you make a new jira specifically about improve BinInterSedesRawComparator's handling of DataByteArray's I will review and commit it. And if you want to learn Pig, you could make another JIRA about improving the performance in general. IMHO BinInterSedes (and that whole code path that touches it) could probably be significantly improved.

W.r.t. to this issue, I think we should either directly compare the bytes (currently leaning towards this), or we can just have a special lightweight comparator that special cases DataByteArrays, and delegates to BinInterSedesRawComparator otherwise. We wouldn't need the complexity of the union approach, and we should get the correctness, speed, and stable bytearray sort order.

That said, IF we decide to preserve byte array sort order, I think we should make a decision now about whether or not we want to define that semantic. If not, then just directly comparing the bytes should be a-ok, since all that is important for bytearrays currently is that a global ordering exists, not what that global ordering is.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480309#comment-13480309 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-----------------------------------------------------

Personally I don't care about byte order, it has no definite semantic already.
However, by including the 4 bytes in the comparison I am afraid we are exposing ourselves to further bugs when the serialization format changes.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478611#comment-13478611 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq.  you can use BinSedesTuple.BinInterSedesTupleRawComparator and it works fine. 

Ah. That's how we can work around the problem. Thanks Jonathan! 

Now, stepping back a bit.  Isn't it the case that original dev intentionally used BytesWritable.Comparator for performance purposes knowingly sacrificing the sort order? Instantiating an object for every compare is known to be slow.  I lack the overall picture of how often this NullableBytesWritable sorting is used but assuming this is the case.

We are talking about two different issues here.
* Result incorrect (when order-by used).  [0.11 and trunk]
* Sort order of bytearray sometimes incorrect. [in all of recent branches including 0.8,0.9 etc]

It would be nice if we can solve both without sacrificing the performance.  However, given that 0.11 is already rolled and any complicate change can lead to new unexpected bugs, can we revert PIG-2862 for 0.11 so that we can keep the previous behavior?

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481628#comment-13481628 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Koji,

It's been a pleasure :)

Ok, I'm king nitpick, but my last nitpick is to statically import assertEquals() (and any other junit methods) instead of calling Assert.whatever. Why? I'm slowly but surely trying to promote a consistent style in all of the unit tests.

As far as the annoying l1 etc variable, I'll let it slide ;) You're right that it's a common pattern, but I think it's a bad one. That said: this isn't the place to fix that.

Also, your tests are great, though a few more comments could be helpful. "compareTwoObjects" is a bit misleading, for example. I know not all of the tests have great comments (I'm probably guilty of this myself), but a line or two per would go a long way. Be the commit you want to see in the world and all that jazz!

In the meantime, I'm going to make sure the tests run, and make sure that the unit tests fail on trunk (ie that it is isolating the issue).

Thanks for being patient with ME, Koji
Jon
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-2975:
------------------------------

    Attachment: pig-2975-trunk_v04-purerawcompare.txt

||patch                   ||correctness||byte order||speed||space||
|no 4 bytes offset compare | O          | X         | O    | O    | 

Attaching a version that compare from the top.
Byte order would now look like 

(2222222222...222222222222222) (repeat 256 times)
(1)
(2)
(12)
(22)
(32)
(222)
(555)
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481748#comment-13481748 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Would it have been the same before? A serialized long would have been 8 bytes and a serialized Integer would have been 4 bytes... I guess it depends what order it is serialized in.

I would just go with what BinInterSedesTupleRawComparator does, and we can note the minor backwards incompatibility (though it doesn't actually violate anything people should be relying on, it might be nice to explain what is going on in the release notes, just to clarify the semantics).
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477468#comment-13477468 ] 

Cheolsoo Park commented on PIG-2975:
------------------------------------

Hi Koji, this is a good finding!

It seems that several e2e tests including MergeJoin, MergeSpareJoin, and Order are also broken for the same reason. In my quick test, you patch seems to fixe them.

Can you please confirm whether they're failing for the same reason? If so, can you please mark PIG-2984 as a duplicate of this jira?

Thanks!
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480393#comment-13480393 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-----------------------------------------------------

Indeed, my idea to keep the order stable is only a "nice to have".
For sure there is no strict requirement to keep it, so I am OK with foregoing it and directly comparing the whole bytes.

Koji, that's a good question. I guess that it could happen if we lose the schema during the execution of a plan, e.g. because of a UDF.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478262#comment-13478262 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. Why does it jump a static 4 bytes?

Because BytesWritable.Comparator is written for BytesWritable and BytesWritable uses the first 4 bytes for its size when being serialized.

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney resolved PIG-2975.
-----------------------------------

    Resolution: Fixed
    
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482077#comment-13482077 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. Now I am running all the tests with your fix and going to close jiras as soon as I verify that they are fixed.

Shoot.  My test run for this patch finished and I see some new tests failing.  Opened Pig-2999.

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480442#comment-13480442 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. One thing I worry about this new sort order is, we might get a worse compression ratio for users' output. (Similar record ordered far away.)

To clarify, pig-2975-trunk_v04-purerawcompare.txt doesn't work well with the reverse domain name. 
(com.yahoo.news)
(org.apache.wiki)
(com.yahoo.sports)

Of course user can specify 'chararray' to achieve that but I'm afraid some users are using the default bytearray type.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-2975:
------------------------------

    Attachment: PIG-2975-0_jco-v2.patch

Thanks Jonathan!

bq. 1.  we could just do the comparison via calls to .get(). ByteBuffered is buffered so I think that in the general case, this will be a win 

Attaching a patch that 
took rid of the extra byte[] copy and calling WritableComparator.compareBytes directly.

This took down the map run time to 
10mins, 12sec (first try) 
10mins, 18sec(second).   But still far from the 6minutes 20 secs.

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481761#comment-13481761 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. Would it have been the same before?
Yes. My test fails on trunk and passes with the patch.

bq.  A serialized long would have been 8 bytes and a serialized Integer would have been 4 bytes..
Actually, Tuple serializes Long by 

BinInterSedes.java
{noformat}
 498             } else if (Integer.MIN_VALUE <= lng && lng <= Integer.MAX_VALUE) {
 499                 out.writeByte(LONG_ININT);
 500                 out.writeInt((int)lng);
{noformat}


bq. I would just go with what BinInterSedesTupleRawComparator does, and we can note the minor backwards incompatibility
+1

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480192#comment-13480192 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Also: I've been loving the contribution from new people, so I'm definitely down to help you fix this instead of doing it myself. Sorry about that. Correctness bugs are just bad bad bad.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-2975:
------------------------------

    Attachment: pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt

bq. statically import assertEquals()
done.

bq. "compareTwoObjects" is a bit misleading,
Renamed to compareTwoObjectsAsNullableBytesWritables.

bq. a few more comments could be helpful.
Added.

Also, added one more test case testLongByteArrays to make sure I'm setting the offset/length right.

One test I don't have an answer on.

{noformat}
118   @Test119   public void testDifferentType() throws Exception {
120      assertTrue("Integer 9999 and Long 9999 considered equal",
121         compareTwoObjectsAsNullableBytesWritables(new Integer(9999), new Long(9999)) != 0 );
122 }
{noformat}

when comparing Integer(9999) and Long(9999) as unknown type, should they be considered same ? or different? 

Inside BinInterSedesTupleRawComparator, it's the latter.  Before, it was former (since the bug was skipping the datatype header).
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478216#comment-13478216 ] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

Ah, I didn't see that. That's what potentially could be the issue. Hmm hmm hmm.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: pig-2975-trunk_v01.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480247#comment-13480247 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-----------------------------------------------------

Hi,
We use ByteBuffer in the comparator for convenience.

However, I don't think we should really compare the 6 minutes of the incorrect version with the 10 minutes of the correct version too much.
IMHO correctness is more important than performance.
The slowness is due to the fact that we need to unnest the ByteArray from the Tuple and that we are using a Tuple to store any kind of data.

That said, BinInterSedes.BinInterSedesRawComparator is meant for performance, so if there is a way to make it faster it's more than welcome.
My guess is that it won't be easy to recover the original speed.
I would suggest to profile the code with some micro benchmark to see where the time is spent.

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480340#comment-13480340 ] 

Koji Noguchi commented on PIG-2975:
-----------------------------------

bq. by including the 4 bytes in the comparison I am afraid we are exposing ourselves to further bugs when the serialization format changes.

Assuming you're referring to my last "Take out the 4 bytes offset" alternative. 
That's comparing from very first byte:TUPLE_1 so that we don't need to worry about how the Tuple (and other dataset) is serialized within.

                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira