You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Anthony Hsu (JIRA)" <ji...@apache.org> on 2015/02/03 06:42:34 UTC
[jira] [Commented] (PIG-4392) RANK BY fails when default_parallel
is greater than cardinality of field being ranked by
[ https://issues.apache.org/jira/browse/PIG-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302792#comment-14302792 ]
Anthony Hsu commented on PIG-4392:
----------------------------------
Patch looks good to me. Just some minor comments and questions:
* Add a space after the semicolons in the for loop declaration: {{for (int i=0;i<job.getJob().getNumReduceTasks();i++) {}}
* Is the order of the tuples in {{iter}} in the test case guaranteed?
* Why does the order of the tuples get reversed?
> RANK BY fails when default_parallel is greater than cardinality of field being ranked by
> ----------------------------------------------------------------------------------------
>
> Key: PIG-4392
> URL: https://issues.apache.org/jira/browse/PIG-4392
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11.1
> Reporter: Anthony Hsu
> Assignee: Daniel Dai
> Fix For: 0.15.0
>
> Attachments: PIG-4392-1.patch
>
>
> To reproduce:
> {code:title=input.txt}
> 1 2 3
> 4 5 6
> 7 8 9
> {code}
> {code:title=rank.pig}
> set default_parallel 4;
> d = load 'input.txt' using PigStorage(' ') as (a:int, b:int, c:int);
> e = rank d by a;
> dump e;
> {code}
> If {{default_parallel}} is set to {{3}}, the script succeeds. So I'm guessing RANK BY has issues if the {{default_parallel}} exceeds the cardinality of the field being ranked by.
> I'm seeing this issue with Pig 0.11.1 (which has the PIG-2932 patch applied) and Hadoop 2.3.0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)