You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (Updated) (JIRA)" <ji...@apache.org> on 2012/01/11 00:32:40 UTC

[jira] [Updated] (PIG-2465) FLATTEN, reorder columns, UNION causes uid conflict

     [ https://issues.apache.org/jira/browse/PIG-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2465:
----------------------------

    Assignee: Daniel Dai
    
> FLATTEN, reorder columns, UNION causes uid conflict
> ---------------------------------------------------
>
>                 Key: PIG-2465
>                 URL: https://issues.apache.org/jira/browse/PIG-2465
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.1, 0.10
>            Reporter: David Wahler
>            Assignee: Daniel Dai
>
> This is a regression in the new logical plan that causes incorrect results in 0.8/0.9, and a fatal "duplicate uid in schema" error on trunk. The following script demonstrates the problem (extracted and simplified from a much larger script):
> {code}A = LOAD 'bug.in' AS (x:{t:(x:int)}, y:{t:(y:int)});
> B1 = FOREACH A GENERATE FLATTEN(x),FLATTEN(y);
> B2 = FOREACH A GENERATE FLATTEN(y),FLATTEN(x);
> C = UNION B1, B2;
> D = GROUP C BY *;{code}
> Input data:
> {code}{(1)}	{(2)}
> {(1)}	{(3)}{code}
> C contains the correct data:
> {code}(1,2)
> (2,1)
> (1,3)
> (3,1){code}
> D should use the entire tuple as the group key (making it essentially a DISTINCT) but instead the output is:
> {code}((1,1),{(1,2),(1,3)})
> ((2,2),{(2,1)})
> ((3,3),{(3,1)}){code}
> The GROUP operation is using ($0,$0) as the key instead of ($0,$1). The logical plan includes the line: {{C: (Name: LOUnion Schema: x::x#37:int,y::y#37:int)}}. Switching to the old logical plan produces the correct output in 0.8, but apparently this is no longer possible in 0.9 and later versions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira