You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2010/01/15 23:56:54 UTC

[jira] Created: (PIG-1193) Secondary sort issue on nested desc sort

Secondary sort issue on nested desc sort
----------------------------------------

                 Key: PIG-1193
                 URL: https://issues.apache.org/jira/browse/PIG-1193
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.6.0
            Reporter: Daniel Dai
            Assignee: Daniel Dai
             Fix For: 0.6.0


Secondary sort doing nested desc sort order incorrectly if the following conditions meet:

1. We have sort and UDF in nested plan
2. This UDF will use the same input tuples more than once
3. The input tuples are sorted in desc order

Here is a test case:
{code}
register sequence.jar;
A = load 'input' as (a0:int);
B = group A ALL;
C = foreach B {
    D = order A by a0 desc;
    generate sequence.CUMULATIVE(D,D);
};
dump C;
{code}

input file:
{code}
3
4
{code}

The input for the UDF is:
{code}
({(4),(3)},{(3),(4)})
{code}

The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802578#action_12802578 ] 

Pradeep Kamath commented on PIG-1193:
-------------------------------------

Reviewed the changes +1 for commit

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1193:
----------------------------

    Attachment: PIG-1193-1.patch

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1193:
----------------------------

    Attachment: PIG-1193-2.patch

Address findbug warnings

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802410#action_12802410 ] 

Alan Gates commented on PIG-1193:
---------------------------------

+1, Patch looks good to me.  But Ying should review it before it is committed, since she wrote the original code for this.

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802488#action_12802488 ] 

Daniel Dai commented on PIG-1193:
---------------------------------

Actually I wrote this part, it is independent with Ying's Accumulator code. 

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1193:
----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Patch committed to trunk and 0.6 branch.

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800984#action_12800984 ] 

Daniel Dai commented on PIG-1193:
---------------------------------

Diagnosis for this issue:
{code}
Reduce plan:
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-56
|
|---New For Each(false)[bag] - 1-55
    |   |
    |   POUserFunc(sequence.CUMULATIVE)[bag] - 1-54
    |   |
    |   |---RelationToExpressionProject[bag][*] - 1-49
    |   |   |
    |   |   |---RelationToExpressionProject[bag][*] - 1-58
    |   |       |
    |   |       |---Project[tuple][1] - 1-46
    |   |
    |   |---RelationToExpressionProject[bag][*] - 1-53
    |       |
    |       |---POSort[bag]() - 1-52
    |           |   |
    |           |   Project[int][0] - 1-51
    |           |
    |           |---Project[tuple][1] - 1-50
    |
    |---Package[tuple]{chararray} - 1-43--------
{code}

We take the first input's reverse POSort and make it a secondary sort key. However, we did not remove the second input's POSort. So the second input for the UDF is reverse reverse sorted.

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801090#action_12801090 ] 

Hadoop QA commented on PIG-1193:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430462/PIG-1193-1.patch
  against trunk revision 899502.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/177/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/177/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/177/console

This message is automatically generated.

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1193:
----------------------------

    Status: Patch Available  (was: Open)

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1193:
----------------------------

    Status: Open  (was: Patch Available)

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1193:
----------------------------

    Status: Patch Available  (was: Open)

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1193) Secondary sort issue on nested desc sort

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801246#action_12801246 ] 

Hadoop QA commented on PIG-1193:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430494/PIG-1193-2.patch
  against trunk revision 899502.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/180/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/180/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/180/console

This message is automatically generated.

> Secondary sort issue on nested desc sort
> ----------------------------------------
>
>                 Key: PIG-1193
>                 URL: https://issues.apache.org/jira/browse/PIG-1193
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
>     D = order A by a0 desc;
>     generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.