You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2010/12/14 08:18:00 UTC

[jira] Commented: (PIG-1766) New logical plan: ImplicitSplitInserter should before DuplicateForEachColumnRewrite

    [ https://issues.apache.org/jira/browse/PIG-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971182#action_12971182 ] 

Daniel Dai commented on PIG-1766:
---------------------------------

The error is due to the conflict of two different uid conflict resolving procedure:
1. If the conflicting uids come from two different relations (PIG-1705)
2. If the conflicting uids come from the same relations (PIG-1732)

Currently, we solve 2 first by checking if ForEach generates the same uid, if so, we convert the subsequent fields into a udf. Then we solve 1 by introducing new uids for every SplitOutput. 

However, if we have a ForEach statement contains conflicting uids due to 1, we erroneously using approach 2 to solve it (since we check 2 first). Although 2 generate different uid for ForEach output, ForEach itself references to the wrong inputs. 

To correct it, we should check 1 first. Procedure 2 will not find the conflict so it will not trigger. 

On the other hand, if we have conflicting uids due to 2, we will never erroneously trigger approach 1 even if we check 1 first. This is because the condition triggering procedure 1 (search for split) will not be affected by the action of procedure 2 (adding udf).

> New logical plan: ImplicitSplitInserter should before DuplicateForEachColumnRewrite
> -----------------------------------------------------------------------------------
>
>                 Key: PIG-1766
>                 URL: https://issues.apache.org/jira/browse/PIG-1766
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>         Attachments: PIG-1766-1.patch
>
>
> The following script produce wrong result:
> {code}
> A = load '1.txt' AS (a0:int, a1:int);
> B = load '2.txt' AS (b0:int, b1:chararray);
> C = join A by a0, B by b0;
> D = foreach B generate b0 as d0, b1 as d1;
> E = join C by a1, D by d0;
> F = foreach E generate b1, d1;
> dump F;
> {code}
> 1.txt:
> 1       2
> 1       3
> 2       4
> 2       5
> 2.txt:
> 1       one
> 2       two
> Expected:
> (one,two)
> We get:
> (one,one)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.