You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2012/11/10 22:43:12 UTC

[jira] [Commented] (PIG-2498) e2e tests failing in some cases due to incorrect unix sort args

    [ https://issues.apache.org/jira/browse/PIG-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494775#comment-13494775 ] 

Rohini Palaniswamy commented on PIG-2498:
-----------------------------------------

Patrick,
 Hit failures because of this in RHEL 6 for some test cases (Order-6,7,8,9,18, Types-20,21,22,23,24,25, Split-6, BigData-7,8). Came up with a patch by changing the failures to -k style, before I came upon this jira. Patch looks good, but I have one comment. Since we are fixing all the sort args, can we move off the obsolete origin-zero syntax and move to the -k style? I would be glad to review, test and commit this one. Thanks.
                
> e2e tests failing in some cases due to incorrect unix sort args
> ---------------------------------------------------------------
>
>                 Key: PIG-2498
>                 URL: https://issues.apache.org/jira/browse/PIG-2498
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>         Attachments: PIG-2498.patch
>
>
> Some e2e tests are failing for me against 23 due to what I think are incorrect arguments to unix sort. For example in Order_6:
> {noformat}
> 			'num' => 6,
> 			'pig' => q\a = load ':INPATH:/singlefile/studenttab10k';
> c = order a by $0;
> store c into ':OUTPATH:';\,
> 			'sortArgs' => ['-t', '	', '+0', '-1'],
> {noformat}
> The pig job is sorting by the first column, however unix sort is being told to sort by the first and second columns.
> From the gnu sort manual (specifically pos2 is _inclusive_): http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html
> {noformat}
> '-k pos1[,pos2]'
> '--key=pos1[,pos2]'
> Specify a sort field that consists of the part of the line between pos1 and pos2 (or the end of the line, if pos2 is omitted), inclusive.
> ...
> On older systems, sort supports an obsolete origin-zero syntax '+pos1 [-pos2]' for specifying sort keys. The obsolete sequence 'sort +a.x -b.y' is equivalent to 'sort -k a+1.x+1,b' if y is '0' or absent, otherwise it is equivalent to 'sort -k a+1.x+1,b+1.y'.
> {noformat}
> I verified this by running the sort manually with +0 -1 and +0 -0, in the first case it fails, in the second case it passes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira