You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2016/10/12 21:48:20 UTC
[jira] [Updated] (PIG-5038) Pig Limit_2 e2e test failed with sort
check
[ https://issues.apache.org/jira/browse/PIG-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohini Palaniswamy updated PIG-5038:
------------------------------------
Assignee: Konstantin Harasov
Summary: Pig Limit_2 e2e test failed with sort check (was: Pig e2e test failed with Sort check failed (TEST: Limit_2))
+1. Committed to trunk.
Thanks for finding a solution/workaround for this. This was something in my todo list to look into for a long time. As per definition of the sort option -k1,3 should work fine and is what we should be doing as order by is done on three columns. The test passes fine in Mac with -k1,3 with sort command working as expected. Not sure why the Linux implementation was doing a wrong sort. For -k1,3 it actually gives result of -k1,1.
> Pig Limit_2 e2e test failed with sort check
> -------------------------------------------
>
> Key: PIG-5038
> URL: https://issues.apache.org/jira/browse/PIG-5038
> Project: Pig
> Issue Type: Bug
> Reporter: Konstantin Harasov
> Assignee: Konstantin Harasov
> Fix For: 0.17.0
>
> Attachments: PIG-5038.patch
>
>
> {noformat}
> error: Going to run sort check command: sort -cs -t -k 1,3 ./out/pigtest/../..-1475241304-nightly.conf/Limit_2.out/out_original
> /bin/sort: ./out/pigtest/../..-1475241304-nightly.conf/Limit_2.out/out_original:27: disorder: 18
> Sort check failed
> INFO: TestDriver::runTestGroup() at 706:Test Limit_2 FAILED at 1475241624
> Ending test Limit_2 at 1475241624
> {noformat}
> The test failed because of difference in sorting in Pig {{(ORDER BY $0,$1,$2)}} and {{sort -t $'\t'-k 1,3}} in bash.
> The problem is that empty fields are sorted/processed differently
> in Pig using {{ORDER BY}} and bash using {{sort}}.
> See example for file studentnulltab10k.
> *Pig*:
> {code:linenumbers=true}
>
>
>
> 0.12
> 1.04
> 1.15
> 1.25
> 1.27
> 1.31
> 1.59
> 1.61
> 1.62
> 1.76
> 1.95
> 2.09
> 2.35
> 2.66
> 3.04
> 3.23
> 3.31
> 3.39
> 3.46
> 3.54
> 3.65
> 3.75
> 3.97
> 18
> 18 0.41
> {code}
> *bash: sort -t $'\t'-k 1,3*
> {code:linenumbers=true}
>
>
>
> 0.12
> 1.04
> 1.15
> 1.25
> 1.27
> 1.31
> 1.59
> 1.61
> 1.62
> 1.76
> 18
> 18 0.41
> 18 0.54
> 18 1.78
> 18 2.46
> 18 2.54
> 19 0.07
> 19 0.27
> 19 0.39
> 19 2.27
> 19 2.50
> 19 2.60
> 19 2.89
> 19 3.87
> 1.95
> {code}
> *bash: sort -t $'\t'-k 1,2*
> {code:linenumbers=true}
>
>
>
> 0.12
> 1.04
> 1.15
> 1.25
> 1.27
> 1.31
> 1.59
> 1.61
> 1.62
> 1.76
> 1.95
> 2.09
> 2.35
> 2.66
> 3.04
> 3.23
> 3.31
> 3.39
> 3.46
> 3.54
> 3.65
> 3.75
> 3.97
> 18
> 18 0.41
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)