You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2010/01/15 23:56:54 UTC
[jira] Created: (PIG-1193) Secondary sort issue on nested desc sort
Secondary sort issue on nested desc sort
----------------------------------------
Key: PIG-1193
URL: https://issues.apache.org/jira/browse/PIG-1193
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Fix For: 0.6.0
Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
1. We have sort and UDF in nested plan
2. This UDF will use the same input tuples more than once
3. The input tuples are sorted in desc order
Here is a test case:
{code}
register sequence.jar;
A = load 'input' as (a0:int);
B = group A ALL;
C = foreach B {
D = order A by a0 desc;
generate sequence.CUMULATIVE(D,D);
};
dump C;
{code}
input file:
{code}
3
4
{code}
The input for the UDF is:
{code}
({(4),(3)},{(3),(4)})
{code}
The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1193) Secondary sort issue on nested desc
sort
Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802578#action_12802578 ]
Pradeep Kamath commented on PIG-1193:
-------------------------------------
Reviewed the changes +1 for commit
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1193:
----------------------------
Attachment: PIG-1193-1.patch
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1193:
----------------------------
Attachment: PIG-1193-2.patch
Address findbug warnings
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1193) Secondary sort issue on nested desc
sort
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802410#action_12802410 ]
Alan Gates commented on PIG-1193:
---------------------------------
+1, Patch looks good to me. But Ying should review it before it is committed, since she wrote the original code for this.
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1193) Secondary sort issue on nested desc
sort
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802488#action_12802488 ]
Daniel Dai commented on PIG-1193:
---------------------------------
Actually I wrote this part, it is independent with Ying's Accumulator code.
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1193:
----------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Patch committed to trunk and 0.6 branch.
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1193) Secondary sort issue on nested desc
sort
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800984#action_12800984 ]
Daniel Dai commented on PIG-1193:
---------------------------------
Diagnosis for this issue:
{code}
Reduce plan:
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-56
|
|---New For Each(false)[bag] - 1-55
| |
| POUserFunc(sequence.CUMULATIVE)[bag] - 1-54
| |
| |---RelationToExpressionProject[bag][*] - 1-49
| | |
| | |---RelationToExpressionProject[bag][*] - 1-58
| | |
| | |---Project[tuple][1] - 1-46
| |
| |---RelationToExpressionProject[bag][*] - 1-53
| |
| |---POSort[bag]() - 1-52
| | |
| | Project[int][0] - 1-51
| |
| |---Project[tuple][1] - 1-50
|
|---Package[tuple]{chararray} - 1-43--------
{code}
We take the first input's reverse POSort and make it a secondary sort key. However, we did not remove the second input's POSort. So the second input for the UDF is reverse reverse sorted.
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1193) Secondary sort issue on nested desc
sort
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801090#action_12801090 ]
Hadoop QA commented on PIG-1193:
--------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12430462/PIG-1193-1.patch
against trunk revision 899502.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 1 new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/177/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/177/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/177/console
This message is automatically generated.
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1193:
----------------------------
Status: Patch Available (was: Open)
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1193:
----------------------------
Status: Open (was: Patch Available)
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1193) Secondary sort issue on nested desc sort
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1193:
----------------------------
Status: Patch Available (was: Open)
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1193) Secondary sort issue on nested desc
sort
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801246#action_12801246 ]
Hadoop QA commented on PIG-1193:
--------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12430494/PIG-1193-2.patch
against trunk revision 899502.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/180/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/180/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/180/console
This message is automatically generated.
> Secondary sort issue on nested desc sort
> ----------------------------------------
>
> Key: PIG-1193
> URL: https://issues.apache.org/jira/browse/PIG-1193
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1193-1.patch, PIG-1193-2.patch
>
>
> Secondary sort doing nested desc sort order incorrectly if the following conditions meet:
> 1. We have sort and UDF in nested plan
> 2. This UDF will use the same input tuples more than once
> 3. The input tuples are sorted in desc order
> Here is a test case:
> {code}
> register sequence.jar;
> A = load 'input' as (a0:int);
> B = group A ALL;
> C = foreach B {
> D = order A by a0 desc;
> generate sequence.CUMULATIVE(D,D);
> };
> dump C;
> {code}
> input file:
> {code}
> 3
> 4
> {code}
> The input for the UDF is:
> {code}
> ({(4),(3)},{(3),(4)})
> {code}
> The first bag is sorted desc, but the second is not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.