You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2013/10/01 19:52:25 UTC

[jira] [Updated] (PIG-3292) Logical plan invalid state: duplicate uid in schema during self-join to get cross product

     [ https://issues.apache.org/jira/browse/PIG-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-3292:
----------------------------

    Attachment: PIG-3292-1.patch

Looks good. Note the issue only occurs in nested cross. Self-cross in top level cross is not a problem, since LOSplit will take care of the uid reassign. Interplay with ColumnPruner is fine here since nested plan will include entire required plan branch. So no need to track the lineage of LOCross in nested cross. To be more specific, check "nested" flag around "fixDuplicateUids". Also add a test case.

> Logical plan invalid state: duplicate uid in schema during self-join to get cross product
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-3292
>                 URL: https://issues.apache.org/jira/browse/PIG-3292
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10.0
>         Environment: CDH 4.2
>            Reporter: Sergey
>            Assignee: Cheolsoo Park
>              Labels: newbie
>             Fix For: 0.12.0, 0.11.2
>
>         Attachments: PIG-3292-1.patch, PIG-3292.patch
>
>
> Hi.
> Looks like PIG-3020
> but works in a different way.
> Our pig version is: 
> Apache Pig version 0.10.0-cdh4.2.0 (rexported) 
> compiled Feb 15 2013, 12:20:54
> Accoring to release note, PIG-3020 is included into CDH 4.2 dist
> http://archive.cloudera.com/cdh4/cdh/4/pig-0.10.0-cdh4.2.0.CHANGES.txt
> The problem:
> We want to do self join to get cross-product
> {code}
> a = load '/input' as (key, x);
> a_group = group a by key;
> b = foreach a_group {
>   y = a.x;
>   pair = cross a.x, y;
>   generate flatten(pair);
> }
> dump b;
> {code}
> And an error:
> {code}
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2270: Logical plan invalid state: duplicate uid in schema : 1-7::x#16:bytearray,y::x#16:bytearray
> {code}
> Here is workaround :)
> {code}
> a = load '/input' as (key, x:int);
> a_group = group a by key;
> b = foreach a_group {
>   y = foreach a generate -(-x);
>   pair = cross a.x, y;
>   generate flatten(pair);
> }
> dump b;
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)