You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2009/01/27 03:08:59 UTC
[jira] Created: (PIG-637) limit with order by is broken in local
mode
limit with order by is broken in local mode
-------------------------------------------
Key: PIG-637
URL: https://issues.apache.org/jira/browse/PIG-637
Project: Pig
Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Shubham Chopra
Shubham, could you take a look.
The following script when ran in local mode just ignores the limit and outputs the entire data set:
a = load 'studenttab10k' as (name, age,gpa);
b = order a by name;
c = limit b 10;
dump c;
The same script works fine in MR mode
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-637) limit with order by is broken in local
mode
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671328#action_12671328 ]
Olga Natkovich commented on PIG-637:
------------------------------------
I am reviewing this patch.
> limit with order by is broken in local mode
> -------------------------------------------
>
> Key: PIG-637
> URL: https://issues.apache.org/jira/browse/PIG-637
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Shubham Chopra
> Attachments: 637.patch
>
>
> Shubham, could you take a look.
> The following script when ran in local mode just ignores the limit and outputs the entire data set:
> a = load 'studenttab10k' as (name, age,gpa);
> b = order a by name;
> c = limit b 10;
> dump c;
> The same script works fine in MR mode
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-637) limit with order by is broken in local
mode
Posted by "Shubham Chopra (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shubham Chopra updated PIG-637:
-------------------------------
Attachment: 637.patch
This happens because the optimizer eliminates the limit after a sort and puts an attribute in POSort/LOSort instead. This attribute is not used in the local mode sorting as this would adversely affect the MR sorting of the samples.
I have modified the code to avoid that optimization happening when executing in the local mode. I have also added a couple of test cases that verify the plans in both local and MR mode.
> limit with order by is broken in local mode
> -------------------------------------------
>
> Key: PIG-637
> URL: https://issues.apache.org/jira/browse/PIG-637
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Shubham Chopra
> Attachments: 637.patch
>
>
> Shubham, could you take a look.
> The following script when ran in local mode just ignores the limit and outputs the entire data set:
> a = load 'studenttab10k' as (name, age,gpa);
> b = order a by name;
> c = limit b 10;
> dump c;
> The same script works fine in MR mode
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-637) limit with order by is broken in local
mode
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich resolved PIG-637.
--------------------------------
Resolution: Fixed
patch committed, thanks, Shubham!
> limit with order by is broken in local mode
> -------------------------------------------
>
> Key: PIG-637
> URL: https://issues.apache.org/jira/browse/PIG-637
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Shubham Chopra
> Attachments: 637.patch
>
>
> Shubham, could you take a look.
> The following script when ran in local mode just ignores the limit and outputs the entire data set:
> a = load 'studenttab10k' as (name, age,gpa);
> b = order a by name;
> c = limit b 10;
> dump c;
> The same script works fine in MR mode
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.