You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2014/02/27 18:15:23 UTC

[jira] [Commented] (PIG-3782) PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing due to incorrect UID assignment

    [ https://issues.apache.org/jira/browse/PIG-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914749#comment-13914749 ] 

Koji Noguchi commented on PIG-3782:
-----------------------------------

This error is happening when PushDownForEachFlatten inserts a FOREACH after 'd=join' to move the flatten after the join for optimization.  Somehow, this new foreach is containing completely new UIDs for q1 and q2.
You can see below that new foreach has q1#25 and q2#26 instead of q1#13 and q2#14 that are later used.
This breaks the linage tracking of ColumnMapKeyPrune.

BEFORE PushDownForEachFlatten
{noformat}
    |---e: (Name: LOForEach Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray)
        |   |
        |   (Name: LOGenerate[false,false,false] Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray)
        |   |   |
        |   |   c::a0:(Name: Project Type: int Uid: 1 Input: 0 Column: (*))
        |   |   |
        |   |   c::q1:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: (*))
        |   |   |
        |   |   c::q2:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: (*))
        |   |
        |   |---(Name: LOInnerLoad[0] Schema: c::a0#1:int)
        |   |
        |   |---(Name: LOInnerLoad[1] Schema: c::q1#13:bytearray)
        |   |
        |   |---(Name: LOInnerLoad[2] Schema: c::q2#14:bytearray)
        |
        |---d: (Name: LOJoin(HASH) Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray,b::b0#7:int,b::b1#8:bytearray)
            |   |
            |   a0:(Name: Project Type: int Uid: 1 Input: 0 Column: 0)
            |   |
            |   b0:(Name: Project Type: int Uid: 7 Input: 1 Column: 0)
            |
            |---c: (Name: LOForEach Schema: a0#1:int,q1#13:bytearray,q2#14:bytearray)
{noformat}

After PushDownForEachFlatten
{noformat}
    |---e: (Name: LOForEach Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray)
        |   |
        |   (Name: LOGenerate[false,false,false] Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray)
        |   |   |
        |   |   c::a0:(Name: Project Type: int Uid: 1 Input: 0 Column: (*))
        |   |   |
        |   |   c::q1:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: (*))
        |   |   |
        |   |   c::q2:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: (*))
        |   |
        |   |---(Name: LOInnerLoad[0] Schema: c::a0#1:int)
        |   |
        |   |---(Name: LOInnerLoad[1] Schema: c::q1#13:bytearray)
        |   |
        |   |---(Name: LOInnerLoad[2] Schema: c::q2#14:bytearray)
        |
        |---d: (Name: LOForEach Schema: c::a0#1:int,q1#25:bytearray,q2#26:bytearray,b::b0#7:int,b::b1#8:bytearray)
            |   |
            |   (Name: LOGenerate[false,true,false,false] Schema: c::a0#1:int,q1#25:bytearray,q2#26:bytearray,b::b0#7:int,b::b1#8:bytearray)
            |   |   |
            |   |   c::a0:(Name: Project Type: int Uid: 1 Input: 0 Column: (*))
            |   |   |
            |   |   c::a2:(Name: Project Type: bag Uid: 3 Input: 1 Column: (*))
            |   |   |
            |   |   b::b0:(Name: Project Type: int Uid: 7 Input: 2 Column: (*))
            |   |   |
            |   |   b::b1:(Name: Project Type: bytearray Uid: 8 Input: 3 Column: (*))
            |   |
            |   |---(Name: LOInnerLoad[0] Schema: c::a0#1:int)
            |   |
            |   |---c::a2: (Name: LOInnerLoad[1] Schema: null)
            |   |
            |   |---(Name: LOInnerLoad[2] Schema: b::b0#7:int)
            |   |
            |   |---(Name: LOInnerLoad[3] Schema: b::b1#8:bytearray)
            |
            |---d: (Name: LOJoin(HASH) Schema: c::a0#1:int,c::a2#3:bag{#4:tuple()},b::b0#7:int,b::b1#8:bytearray)
                |   |
                |   a0:(Name: Project Type: int Uid: 1 Input: 0 Column: 0)
                |   |
                |   b0:(Name: Project Type: int Uid: 7 Input: 1 Column: 0)
                |
                |---c: (Name: LOForEach Schema: a0#1:int,a2#3:bag{#4:tuple()})
{noformat}

> PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing due to incorrect UID assignment
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3782
>                 URL: https://issues.apache.org/jira/browse/PIG-3782
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>
> {noformat}
> a = load '1.txt' as (a0:int, a1, a2:bag{});
> b = load '2.txt' as (b0:int, b1);
> c = foreach a generate a0, flatten(a2) as (q1, q2);
> d = join c by a0, b by b0;
> e = foreach d generate a0, q1, q2;
> f = foreach e generate a0, (int)q1, (int)q2;
> store f into 'output';
> {noformat}
> This pig script fails with 
> 2014-02-27 11:49:45,657 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 13 Input: 0 Column: 1)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)